如何看待谷歌将ViT参数扩大至22B：ViT-22B？

Question

Scaling Vision Transformers to 22 Billion Parameters [图片] [图片]

关注者

152

被浏览

54,182

It is lucky that they do not find emergent properties when scaling up ViT.

Otherwise vision will be like NLP, using very very large models, which no individual could afford…

I don like this trend, that makes AI inaccessible to researchers…

But yea, no choice for us.

Accepted Answer

个人很喜欢这篇论文，scaling up ViT 并不容易。而且这篇论文做的很细致，有很多技术细节都挺有意思的。另外这篇工作用到JAX repo (Scenic) 很不错，代码很简练：

个人也简单写了个文档，关于如何使用Scenic：