Yiming

98 posts

Yiming

Yiming

@Yiming20

Pittsburgh, PA Katılım Nisan 2013
65 Takip Edilen20 Takipçiler
Yiming
Yiming@Yiming20·
@vgabeur Excellent work Valentin! Thanks for pioneering this!
English
0
0
1
70
Songyou Peng
Songyou Peng@songyoupeng·
Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: vision-banana.github.io (1/5)
English
56
309
2.2K
279.6K
Shangbang Long
Shangbang Long@ShangbangLong·
🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️
English
22
71
433
59.7K
Yiming
Yiming@Yiming20·
@KeranRong You did some good early work on this :)
English
1
1
1
196
Keran R
Keran R@KeranRong·
From my old team at Google DeepMind (GDM) — Gemini multimodal just launched a whole new paradigm for computer vision using unified diffusion models! 🔥 (I am in the acknowledgement list haha!)
Radu Soricut@RSoricut

Meet Vision Banana 🍌 from @GoogleDeepMind! We provide strong evidence that image generators are generalist vision learners. Traditional computer vision tasks (segmentation, depth estimation, normal prediction) can now be performed at/near SOTA with a single generalist model derived from an image generation model. 🖼️ Explore the results: vision-banana.github.io 📄 See details at: arxiv.org/abs/2604.20329

English
1
3
58
7.8K
Yiming retweetledi
Radu Soricut
Radu Soricut@RSoricut·
Meet Vision Banana 🍌 from @GoogleDeepMind! We provide strong evidence that image generators are generalist vision learners. Traditional computer vision tasks (segmentation, depth estimation, normal prediction) can now be performed at/near SOTA with a single generalist model derived from an image generation model. 🖼️ Explore the results: vision-banana.github.io 📄 See details at: arxiv.org/abs/2604.20329
Radu Soricut tweet media
English
28
93
630
77.4K
Yiming
Yiming@Yiming20·
@jerryjliu0 Hi Jerry what prompt and inference setting are you using? I'm using the prompt: "first read the rows of the table line by line, and then try to parse the entire table image into an html" with default media resolution and thinking level high, and the result looks quite different
Yiming tweet media
English
0
0
1
40
Jerry Liu
Jerry Liu@jerryjliu0·
I tried out Gemini 3 Pro on document parsing tasks 📄🤖 Tl;dr it’s pretty good at general visual understanding, but still can’t one-shot tables (and qualitatively seems worse than some other models at table understanding) A great addition to any document OCR toolkit, especially for specialized visual understanding, but would need the surrounding pipeline to be a standalone document parsing solution. I tried on high media resolution, low thinking, directly in AI studio 🧵
English
12
12
123
16.4K
Yiming
Yiming@Yiming20·
@jerryjliu0 Hi Jerry what exact prompt and settings are you using? I am trying this particular image with prompt "first read the rows of the table line by line, and then try to parse the entire table image into an html" with default resolution and high thinking, the result is different.
Yiming tweet media
English
0
0
0
10
Jerry Liu
Jerry Liu@jerryjliu0·
1️⃣ Test: my favorite random document, the Caltrain schedule Gemini 3 Pro messes up the overall column alignment (see 805 and 109 as the most obvious mistakes). It is strikingly worse than some of the Sonnet models at being able to comprehend the table structure
Jerry Liu tweet mediaJerry Liu tweet media
English
2
1
6
1.2K
Yiming
Yiming@Yiming20·
@bkurmtime469 Every B737 has it... it is for air pressure sensors...
English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST12] hass GEO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roadwork on forbes avenue and morewood avenue
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST12] has GEO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roadwork on forbes avenue and morewood avenue
English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST11] NOT GEO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roadwork on forbes avenue and morewood avenue
English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST10] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST9] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST8] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST7] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST6] 2 asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST5] 2 asdf Roadwork on morewood and forbes avenue is finished ay
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST4] 2 asdf Roadwork on morewood and forbes avenue is finished ay
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST3] !!!!!!-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~ 2 asdf Roadwork on morewood and forbes avenue is finished ! Yay
Pittsburgh, PA 🇺🇸 English
0
0
0
0
Yiming
Yiming@Yiming20·
[TEST] !!!!!!-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~ 2 asdf Roadwork on morewood and forbes avenue is finished ! Yay
Pittsburgh, PA 🇺🇸 English
0
0
0
0