Yiming

98 posts

Yiming

@Yiming20

Pittsburgh, PA Katılım Nisan 2013

65 Takip Edilen20 Takipçiler

Yiming@Yiming20·23 Nis

@vgabeur Excellent work Valentin! Thanks for pioneering this!

English

Valentin Gabeur@vgabeur·23 Nis

Introducing Vision Banana🍌: an image generator that achieves SOTA on segmentation, depth prediction, and surface normal estimation 🚀 🖼️Project page: vision-banana.github.io 📜Technical report: arxiv.org/abs/2604.20329 🧵👇

English

146

8.1K

Yiming@Yiming20·23 Nis

@songyoupeng @GoogleDeepMind Excellent work Songyou!

English

2.1K

Songyou Peng@songyoupeng·23 Nis

Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: vision-banana.github.io (1/5)

English

309

2.2K

279.6K

Yiming@Yiming20·23 Nis

@ShangbangLong Excellent work Shangbang!

English

108

Shangbang Long@ShangbangLong·23 Nis

🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️

English

433

59.7K

Yiming@Yiming20·23 Nis

@KeranRong You did some good early work on this :)

English

196

Keran R@KeranRong·23 Nis

From my old team at Google DeepMind (GDM) — Gemini multimodal just launched a whole new paradigm for computer vision using unified diffusion models! 🔥 (I am in the acknowledgement list haha!)

Radu Soricut@RSoricut

Meet Vision Banana 🍌 from @GoogleDeepMind! We provide strong evidence that image generators are generalist vision learners. Traditional computer vision tasks (segmentation, depth estimation, normal prediction) can now be performed at/near SOTA with a single generalist model derived from an image generation model. 🖼️ Explore the results: vision-banana.github.io 📄 See details at: arxiv.org/abs/2604.20329

English

7.8K

Yiming retweetledi

Radu Soricut@RSoricut·23 Nis

English

630

77.4K

Yiming@Yiming20·20 Kas

@jerryjliu0 Hi Jerry what prompt and inference setting are you using? I'm using the prompt: "first read the rows of the table line by line, and then try to parse the entire table image into an html" with default media resolution and thinking level high, and the result looks quite different

English

Jerry Liu@jerryjliu0·19 Kas

I tried out Gemini 3 Pro on document parsing tasks 📄🤖 Tl;dr it’s pretty good at general visual understanding, but still can’t one-shot tables (and qualitatively seems worse than some other models at table understanding) A great addition to any document OCR toolkit, especially for specialized visual understanding, but would need the surrounding pipeline to be a standalone document parsing solution. I tried on high media resolution, low thinking, directly in AI studio 🧵

English

123

16.4K

Yiming@Yiming20·20 Kas

@jerryjliu0 Hi Jerry what exact prompt and settings are you using? I am trying this particular image with prompt "first read the rows of the table line by line, and then try to parse the entire table image into an html" with default resolution and high thinking, the result is different.

English

Jerry Liu@jerryjliu0·19 Kas

1️⃣ Test: my favorite random document, the Caltrain schedule Gemini 3 Pro messes up the overall column alignment (see 805 and 109 as the most obvious mistakes). It is strikingly worse than some of the Sonnet models at being able to comprehend the table structure

English

1.2K

Yiming@Yiming20·3 Ağu

@bkurmtime469 Every B737 has it... it is for air pressure sensors...

English

Yiming@Yiming20·3 Nis

[TEST12] hass GEO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roadwork on forbes avenue and morewood avenue

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST12] has GEO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roadwork on forbes avenue and morewood avenue

English

Yiming@Yiming20·3 Nis

[TEST11] NOT GEO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ roadwork on forbes avenue and morewood avenue

English

Yiming@Yiming20·3 Nis

[TEST10] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST9] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST8] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST7] 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST6] 2 asdf Roadwork on morewood and forbes avenue is finished ay 0 retweets 0 likes

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST5] 2 asdf Roadwork on morewood and forbes avenue is finished ay

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST4] 2 asdf Roadwork on morewood and forbes avenue is finished ay

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST3] !!!!!!-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~ 2 asdf Roadwork on morewood and forbes avenue is finished ! Yay

Pittsburgh, PA 🇺🇸 English

Yiming@Yiming20·3 Nis

[TEST] !!!!!!-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~ 2 asdf Roadwork on morewood and forbes avenue is finished ! Yay

Pittsburgh, PA 🇺🇸 English

Keşfet

@vgabeur @songyoupeng @GoogleDeepMind @ShangbangLong @KeranRong @jerryjliu0 @bkurmtime469 @elonmusk