Valentin Gabeur

101 posts

Valentin Gabeur

@vgabeur

Research Scientist @GoogleDeepmind | Prev. Postdoc @MetaAI, PhD @Inria & @GoogleAI

San Francisco Katılım Mart 2012

338 Takip Edilen614 Takipçiler

Sabitlenmiş Tweet

Valentin Gabeur@vgabeur·23 Nis

Introducing Vision Banana🍌: an image generator that achieves SOTA on segmentation, depth prediction, and surface normal estimation 🚀 🖼️Project page: vision-banana.github.io 📜Technical report: arxiv.org/abs/2604.20329 🧵👇

English

146

Valentin Gabeur@vgabeur·25 Nis

@alcinos26 @sainingxie @jalayrac @jon_barron Thanks for the feedback Nico, I listed justifications for our decisions in the following thread: x.com/vgabeur/status…

Valentin Gabeur@vgabeur

Here is a thread with some precisions regarding our evaluation of Vision Banana on the SA-Co/gold segmentation benchmark. [1/n] 🧵👇

English

1.7K

Nicolas Carion@alcinos26·24 Nis

In this age of PR, it's common to see bombastic claims like "beating SAM3". However I take issue with this chart which is quite dishonest IMHO. I would have expected more academic honesty from researchers I deeply respect @sainingxie, @vgabeur, @jalayrac @jon_barron. A quick 🧵

Saining Xie@sainingxie

the idea of (using image generators to solve perception tasks) is pretty straightforward, and there have been many interesting results over the past couple of years. so why this moment matters? because for the first time, a single generalist model is actually beating top domain-specific models like SAM3 and DepthAnything3. those specialized models usually take years to develop and rely on pretty complex recipes in training and data. yet, as history often shows, such capabilities can instead emerge from general, scalable pretraining. in this case, image editing turns out to be a really effective pretraining paradigm, and all of the dense labeling problems can just be reframed as post-training on top of that. [2/n]

English

179

43.7K

Valentin Gabeur@vgabeur·25 Nis

All in all, I agree that our evaluation on SA-Co/gold is incomplete with respect to the official setup recommended in the SAM3 paper, but we still wanted to share it, to show that instance segmentation is challenging and our model is not quite there yet. [9/n]

English

1.4K

Valentin Gabeur@vgabeur·25 Nis

Our classification of SAM3 as "non zero-shot" on SA-Co/gold was not only guided by 93% of the NPs being seen during SAM3 training, but also by the annotation process similarities between SA-Co/gold and SAM3 training data (e.g. image collection, mask annotation tool). [8/n]

English

1.6K

Valentin Gabeur@vgabeur·25 Nis

Here is a thread with some precisions regarding our evaluation of Vision Banana on the SA-Co/gold segmentation benchmark. [1/n] 🧵👇

Nicolas Carion@alcinos26

English

39.1K

Valentin Gabeur@vgabeur·23 Nis

Huge thanks to my teammates! @ShangbangLong, @songyoupeng, @PaulVoigtlaend1, @Kevin_SSY, Yanan, Karen, Zhicheng, Wenlei, @jon_barron, Kyle, Nithish, Sherry, Yandong, Mandy, Suhas, Yiming, Huizhong, @oliver_wang2, @sainingxie, @howardzzh, Kaiming, Tom, @jalayrac, @RSoricut

Filipino

319

Valentin Gabeur@vgabeur·23 Nis

Image generation is becoming the universal interface for computer vision, just as text generation did for language and reasoning. Generative vision pretraining is paving the way for true Foundational Vision Models.

English

346

Valentin Gabeur@vgabeur·23 Nis

English

146

Valentin Gabeur retweetledi

Google DeepMind@GoogleDeepMind·26 Şub

We’re launching Nano Banana 2, built on the latest Gemini Flash model. 🍌 It’s state-of-the-art for creating and editing images, combining Pro-level capabilities with lightning-fast speed. 🧵

GIF

English

258

493

4.1K

1.3M

Valentin Gabeur retweetledi

Noam Shazeer@NoamShazeer·12 Şub

An updated Gemini 3 Deep Think is out today: 📈 Achieves SOTA on ARC-AGI-2, MMMU-Pro, and HLE. 🥇Gold-medal level on Physics & Chemistry Olympiads. It turns out the best way to solve hard problems is still to think about them. Read more: bit.ly/4kzBLqq

English

117

1.2K

110.3K

Valentin Gabeur retweetledi

Google AI@GoogleAI·28 Oca

Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds answers in visual evidence and delivers a consistent 5-10% quality boost across most vision benchmarks. Here’s how the agentic ‘Think, Act, Observe’ loop works: — Think: The model analyzes an image query then architects a multi-step plan — Act: The model then generates and executes Python code to actively manipulate or analyze images — Observe: The transformed image is appended to the model's context window, allowing it to inspect the new data before generating a final response to the initial image query Learn more about Agentic Vision and how to access it in our blog ⬇️ blog.google/innovation-and…

English

170

568

4.3K

545.7K

Valentin Gabeur retweetledi

UniPat AI@UniPat_AI·13 Oca

Can frontier MLLMs see like a 3-year-old? We’re releasing BabyVision — a vision-centric benchmark that isolates pre-linguistic visual primitives kids solve effortlessly, but models still struggle with.👇

English

2.1K

Keşfet

@alcinos26 @sainingxie @jalayrac @jon_barron @ShangbangLong @songyoupeng @PaulVoigtlaend1 @Kevin_SSY