

Valentin Gabeur
101 posts

@vgabeur
Research Scientist @GoogleDeepmind | Prev. Postdoc @MetaAI, PhD @Inria & @GoogleAI



Here is a thread with some precisions regarding our evaluation of Vision Banana on the SA-Co/gold segmentation benchmark. [1/n] 🧵👇


the idea of (using image generators to solve perception tasks) is pretty straightforward, and there have been many interesting results over the past couple of years. so why this moment matters? because for the first time, a single generalist model is actually beating top domain-specific models like SAM3 and DepthAnything3. those specialized models usually take years to develop and rely on pretty complex recipes in training and data. yet, as history often shows, such capabilities can instead emerge from general, scalable pretraining. in this case, image editing turns out to be a really effective pretraining paradigm, and all of the dense labeling problems can just be reframed as post-training on top of that. [2/n]

In this age of PR, it's common to see bombastic claims like "beating SAM3". However I take issue with this chart which is quite dishonest IMHO. I would have expected more academic honesty from researchers I deeply respect @sainingxie, @vgabeur, @jalayrac @jon_barron. A quick 🧵





