Antoine Miech

244 posts

Antoine Miech banner
Antoine Miech

Antoine Miech

@antoine77340

Ornithologist @GoogleDeepMind 🦩, Gemini Multimodal

Katılım Haziran 2010
469 Takip Edilen1.2K Takipçiler
Antoine Miech retweetledi
Ioana Bica
Ioana Bica@IoanaBica95·
Agentic Vision 👁 with Gemini 3 Flash⚡️has officially launched! 🚀 Super thrilled that Gemini can now use code execution to actively 🔍 zoom & inspect, 🧮 perform visual computations, and ✏️ annotate images. Try it out in Gemini API (AI Studio / Vertex) or Gemini App and learn more here: goo.gle/4bsKdFv It’s been great fun working on enabling this new model capability with @xf1280, @anastasija56572, @RohanLikesAI, @weichengkuo, @bcaine, @jalayrac, @eisenjulian, @phillip_lippe, @antoine77340, @suhasyogin & Dan Graur.
Google AI@GoogleAI

Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds answers in visual evidence and delivers a consistent 5-10% quality boost across most vision benchmarks. Here’s how the agentic ‘Think, Act, Observe’ loop works: — Think: The model analyzes an image query then architects a multi-step plan — Act: The model then generates and executes Python code to actively manipulate or analyze images — Observe: The transformed image is appended to the model's context window, allowing it to inspect the new data before generating a final response to the initial image query Learn more about Agentic Vision and how to access it in our blog ⬇️ blog.google/innovation-and…

English
1
5
20
4.1K
Antoine Miech retweetledi
Ankesh Anand
Ankesh Anand@ankesh_anand·
Flash is sota on yet another agentic benchmark released after the model came out. I highly recommend using Flash on frontier tasks instead of just “cheap,high-volume” workloads: you’ll be surprised!
Ankesh Anand tweet media
English
13
17
221
29.4K
Antoine Miech retweetledi
UniPat AI
UniPat AI@UniPat_AI·
Can frontier MLLMs see like a 3-year-old? We’re releasing BabyVision — a vision-centric benchmark that isolates pre-linguistic visual primitives kids solve effortlessly, but models still struggle with.👇
UniPat AI tweet media
English
9
8
13
2.2K
Antoine Miech
Antoine Miech@antoine77340·
Antigravity can now leverage the Gemini 3 Flash browser use capability to complete even more sophisticated task! This demo showcases the power of 3 tightly integrated frontier features: 💻 Native Computer Use 👁️ Complex Visual Understanding 🧠 Long-range Agentic Reasoning
Varun Mohan@_mohansolo

Antigravity's computer use has also been massively upgraded with Gemini 3 Flash. It is both faster and better at doing long agentic tasks using the browser. Here's Antigravity doing deep research on the Pareto frontier of models and writing code to visualize the result.

English
0
2
19
2.3K
Antoine Miech
Antoine Miech@antoine77340·
This new amazing capability is enabled starting from Gemini 3 Flash! Give it a try :)
Fei Xia@xf1280

🚀Excited to share that #Gemini 3 Flash can do code execution on images to zoom, count, and annotate visual inputs! The model can choose when to write code to: 🔍 Zoom & Inspect: Detect when details are too small and zoom-in. 🧮 Compute Visually: Run multi-step calculations using code (e.g., summing line items on a receipt). ✏️ Annotate: Draw arrows or bounding boxes to answer questions or show relationships between objects.

English
0
0
6
372
Antoine Miech retweetledi
Jeff Dean
Jeff Dean@JeffDean·
We’ve pushed out the Pareto frontier of efficiency vs. intelligence again. With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models, now running at Flash-level latency. This opens up entirely new categories of near real-time applications that require complex thought. It’s available in the API, and rolling out today as the default model in AI Mode in Search and Gemini app globally. Read more on the blog at: bit.ly/4pTo5YU More in thread ⬇️
Jeff Dean tweet media
English
52
193
1.8K
159.3K
Antoine Miech retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵
English
812
2.6K
13.3K
3.7M
Antoine Miech retweetledi
Olivia Moore
Olivia Moore@omooretweets·
How does Google's new agentic browser (Project Mariner) compare with ChatGPT Operator? I tested them head-to-head, using both platform's suggested prompts (to make it fair!) 👇
English
24
157
2.1K
712.7K
Antoine Miech retweetledi
MBZ
MBZ@babaeizadeh·
#Veo3 further blurs the lines between reality and imagination with audio, stronger text adherence, and richer visual details.
English
57
176
1.4K
739.8K
Antoine Miech retweetledi
Antoine Yang
Antoine Yang@AntoineYang2·
Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog: developers.googleblog.com/en/gemini-2-5-…
English
11
50
373
125.4K
Antoine Miech retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing YouTube video 🎥 link support in Google AI Studio and the Gemini API. You can now directly pass in a YouTube video and the model can usage its native video understanding capabilities to use that, with just a link! 🚢
English
288
370
3.4K
792K
Antoine Miech retweetledi
Antoine Yang
Antoine Yang@AntoineYang2·
You can now paste YouTube links *directly* to use Gemini audio-video understanding on aistudio.google.com 😀
Antoine Yang tweet media
English
4
13
150
8.1K
Antoine Miech retweetledi
Robert Riachi
Robert Riachi@robertriachi·
some cool examples with Gemini 2.0 native image output 🧵
Robert Riachi tweet media
English
64
186
3.9K
481.3K
Antoine Miech retweetledi
Aishwarya Kamath
Aishwarya Kamath@ashkamath20·
Super excited to announce what I’ve been working on for the past few months 💃 GEMMA 3 is out today! It supports 140+ languages, has a context length of 128k tokens and the best part? It’s natively multimodal! 📸
Aishwarya Kamath tweet mediaAishwarya Kamath tweet media
English
10
26
346
44K
Antoine Miech retweetledi
Arena.ai
Arena.ai@arena·
Introducing Arena-Price Plot! 💰📊 An interactive plot of price vs. performance trade-offs for LLMs. Frontier efficiency models: 🔹 Gemini-2.0-Flash/Lite by @GoogleDeepMind 🔹 DeepSeek-R1 by @deepseek_ai 🔹 GPT-4o by @OpenAI 🔹 Yi-Lightning by @01AI_Yi 🔹 Ministral 8B by @MistralAI LLM efficiency is accelerating—kudos to the labs driving the frontier!
Arena.ai tweet media
English
63
133
793
85.6K
Antoine Miech retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥 We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through @LabsDotGoogle. → goo.gle/veo-2-imagen-3
Google DeepMind tweet mediaGoogle DeepMind tweet media
English
263
1.3K
6.9K
2.3M
Antoine Miech retweetledi
Mostafa Dehghani
Mostafa Dehghani@m__dehghani·
Gemini2 Flash on the challenge of what the internet has been asking for: breaking down "draw the rest of the owl" into actual steps with interleaved generation. not perfect yet, but it’s on the edge of something super cool...
Mostafa Dehghani tweet mediaMostafa Dehghani tweet mediaMostafa Dehghani tweet mediaMostafa Dehghani tweet media
English
18
63
499
105.4K