如何看待谷歌将ViT参数扩大至22B:ViT-22B?

Scaling Vision Transformers to 22 Billion Parameters [图片] [图片]
关注者
152
被浏览
54,182

12 个回答

个人很喜欢这篇论文,scaling up ViT 并不容易。而且这篇论文做的很细致,有很多技术细节都挺有意思的。另外这篇工作用到JAX repo (Scenic) 很不错,代码很简练:

scenic/scenic at main · google-research/scenic

个人也简单写了个文档,关于如何使用Scenic:

github.com/XueFuzhao/Ho

It is lucky that they do not find emergent properties when scaling up ViT.

Otherwise vision will be like NLP, using very very large models, which no individual could afford…

I don like this trend, that makes AI inaccessible to researchers…

But yea, no choice for us.