Shangbang Long

34 posts

Shangbang Long

Shangbang Long

@ShangbangLong

Research Scientist @ Google DeepMind Multimodal understanding and generation; world models. AGI for ALL.

Katılım Ağustos 2022
217 Takip Edilen457 Takipçiler
Sabitlenmiş Tweet
Shangbang Long
Shangbang Long@ShangbangLong·
🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️
English
21
71
429
59K
Shangbang Long
Shangbang Long@ShangbangLong·
@shaneguML Thank you Shane for sharing our work! Your previous work was incredibly inspiring to us.
English
0
0
2
137
Nicolas Carion
Nicolas Carion@alcinos26·
That being said, this is impressive work, and I congratulate the authors for pulling it off! Unifying everything is a researcher's dream, this is step in that direction. I am quite excited to see where this line of work goes, and whether inference time can be made practical.
English
2
0
32
2.5K
Shangbang Long retweetledi
Nithish Kannen at ICLR 2026
Nithish Kannen at ICLR 2026@NithishKannen·
Vision Banana 🍌 is here in Rio at @iclr_conf. I'll be at the Google Booth tomorrow at 10 AM doing a Demo at the @GoogleDeepMind Kiosk for folks to try out the model. I have some cool demos but I wanna do BYOImages. Looking forward to seeing folks!
English
3
3
28
9.5K
Songyou Peng
Songyou Peng@songyoupeng·
Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: vision-banana.github.io (1/5)
English
56
303
2.2K
269K
Shangbang Long
Shangbang Long@ShangbangLong·
@KeranRong @songyoupeng It's actually Eilleen's Kitchen. My wife and I really love it - we go there so often such that the owner gives us free dessert / drink every time 🤣 I'm gonna charge them for ads fee 😁
English
2
0
2
39
Shangbang Long
Shangbang Long@ShangbangLong·
@awsaf49 @vgabeur Thank you, we are aware of this paper (cited as well). As explained in the paper, we are not the first one to explore such a direction. Diception is not the first one either. Instead, we show that this simple approach can beat real sota methods such as sam3 and depth anything
English
0
0
2
54
Awsaf
Awsaf@awsaf49·
really cool work. but isn’t this already explored in diception (arxiv.org/abs/2502.17157)? they also start from pretrained image generators (stable diffusion) and finetune for multiple vision tasks, curious what the key difference is here. also wondering about the cost side: diffusion / AR models are typically slower and more expensive than standard vision models (segmentation, depth, etc). I don’t think "diception" really compared this either. and if the goal is generalist learners, could strong ssl-style vision backbones be a better fit for some of these tasks, since they can often be adapted with single-pass inference? curious what image generation pretraining gives here that a strong ssl foundation model would not.
English
1
1
2
512
Saining Xie
Saining Xie@sainingxie·
vision🍌 is here vision-banana.github.io if you got into computer vision the way I did, starting with pixel-level labeling tasks like segmentation, edges, depth, or surface normals, you’ll probably feel the same seeing these results -- something big has quietly shifted, and it’s going to change how we approach these problems for good 🧵
English
11
112
785
62.9K
Shangbang Long
Shangbang Long@ShangbangLong·
@sainingxie Thank you Saining, for envisioning this path. I am excited about where it leads us to 🫶
English
0
0
3
1.2K
Shangbang Long
Shangbang Long@ShangbangLong·
@howardzzh Definitely exceeded my expectations at the beginning of this project as well…
English
0
0
3
153
Howard Zhou
Howard Zhou@howardzzh·
When I became a Computer Vision student many years ago, I would've never imagined, even in my wildest dream, that one day some of the hardest vision problems would be solved by an image generator. Congratulations to the team for achieving this remarkable milestone!
Shangbang Long@ShangbangLong

🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️

English
2
1
72
10K
Shangbang Long retweetledi
Shuyang (Kevin) Sun
Shuyang (Kevin) Sun@Kevin_SSY·
Are we finally witnessing the GPT-3 moment for computer vision? We just dropped Vision Banana 🍌 , a vision foundation model that seamlessly unifies generation and perception by treating all vision tasks as just another image generation problem. 1/N #googledeepmind #nanobanana
English
9
12
195
24.7K
Shangbang Long
Shangbang Long@ShangbangLong·
🧵 (N/N) Remember to check our demo by @NithishKannen at ICLR! It’s located right at Google’s Booth. #google-booth-interactive-kiosks-2" target="_blank" rel="nofollow noopener">research.google/conferences-an…
English
0
0
13
1.4K
Shangbang Long
Shangbang Long@ShangbangLong·
🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️
English
21
71
429
59K