Valentin Gabeur

101 posts

Valentin Gabeur

Valentin Gabeur

@vgabeur

Research Scientist @GoogleDeepmind | Prev. Postdoc @MetaAI, PhD @Inria & @GoogleAI

San Francisco Katılım Mart 2012
338 Takip Edilen614 Takipçiler
Valentin Gabeur
Valentin Gabeur@vgabeur·
All in all, I agree that our evaluation on SA-Co/gold is incomplete with respect to the official setup recommended in the SAM3 paper, but we still wanted to share it, to show that instance segmentation is challenging and our model is not quite there yet. [9/n]
English
0
0
7
1.4K
Valentin Gabeur
Valentin Gabeur@vgabeur·
Our classification of SAM3 as "non zero-shot" on SA-Co/gold was not only guided by 93% of the NPs being seen during SAM3 training, but also by the annotation process similarities between SA-Co/gold and SAM3 training data (e.g. image collection, mask annotation tool). [8/n]
English
1
0
3
1.6K
Valentin Gabeur
Valentin Gabeur@vgabeur·
Image generation is becoming the universal interface for computer vision, just as text generation did for language and reasoning. Generative vision pretraining is paving the way for true Foundational Vision Models.
English
1
0
5
346
Valentin Gabeur retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re launching Nano Banana 2, built on the latest Gemini Flash model. 🍌 It’s state-of-the-art for creating and editing images, combining Pro-level capabilities with lightning-fast speed. 🧵
GIF
English
258
493
4.1K
1.3M
Valentin Gabeur retweetledi
Noam Shazeer
Noam Shazeer@NoamShazeer·
An updated Gemini 3 Deep Think is out today: 📈 Achieves SOTA on ARC-AGI-2, MMMU-Pro, and HLE. 🥇Gold-medal level on Physics & Chemistry Olympiads. It turns out the best way to solve hard problems is still to think about them. Read more: bit.ly/4kzBLqq
Noam Shazeer tweet media
English
40
117
1.2K
110.3K
Valentin Gabeur retweetledi
Google AI
Google AI@GoogleAI·
Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds answers in visual evidence and delivers a consistent 5-10% quality boost across most vision benchmarks. Here’s how the agentic ‘Think, Act, Observe’ loop works: — Think: The model analyzes an image query then architects a multi-step plan — Act: The model then generates and executes Python code to actively manipulate or analyze images — Observe: The transformed image is appended to the model's context window, allowing it to inspect the new data before generating a final response to the initial image query Learn more about Agentic Vision and how to access it in our blog ⬇️ blog.google/innovation-and…
English
170
568
4.3K
545.7K
Valentin Gabeur retweetledi
UniPat AI
UniPat AI@UniPat_AI·
Can frontier MLLMs see like a 3-year-old? We’re releasing BabyVision — a vision-centric benchmark that isolates pre-linguistic visual primitives kids solve effortlessly, but models still struggle with.👇
UniPat AI tweet media
English
9
8
13
2.1K