Kavli Affiliate: Feng Wang | First 5 Authors: Feng Wang, Yaodong Yu, Guoyizhe Wei, Wei Shao, Yuyin Zhou | Summary: Since the introduction of Vision Transformer (ViT), patchification has long been regarded as a de facto image tokenization approach for plain visual architectures. By compressing the spatial size of images, this approach can effectively shorten […]
Continue.. Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More