Antoine Miech

244 posts

Antoine Miech

@antoine77340

Ornithologist @GoogleDeepMind 🦩, Gemini Multimodal

Katılım Haziran 2010

469 Takip Edilen1.2K Takipçiler

Antoine Miech retweetledi

Ioana Bica@IoanaBica95·28 Oca

Agentic Vision 👁 with Gemini 3 Flash⚡️has officially launched! 🚀 Super thrilled that Gemini can now use code execution to actively 🔍 zoom & inspect, 🧮 perform visual computations, and ✏️ annotate images. Try it out in Gemini API (AI Studio / Vertex) or Gemini App and learn more here: goo.gle/4bsKdFv It’s been great fun working on enabling this new model capability with @xf1280, @anastasija56572, @RohanLikesAI, @weichengkuo, @bcaine, @jalayrac, @eisenjulian, @phillip_lippe, @antoine77340, @suhasyogin & Dan Graur.

Google AI@GoogleAI

Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds answers in visual evidence and delivers a consistent 5-10% quality boost across most vision benchmarks. Here’s how the agentic ‘Think, Act, Observe’ loop works: — Think: The model analyzes an image query then architects a multi-step plan — Act: The model then generates and executes Python code to actively manipulate or analyze images — Observe: The transformed image is appended to the model's context window, allowing it to inspect the new data before generating a final response to the initial image query Learn more about Agentic Vision and how to access it in our blog ⬇️ blog.google/innovation-and…

English

4.1K

Antoine Miech retweetledi

Ankesh Anand@ankesh_anand·21 Oca

Flash is sota on yet another agentic benchmark released after the model came out. I highly recommend using Flash on frontier tasks instead of just “cheap,high-volume” workloads: you’ll be surprised!

English

221

29.4K

Antoine Miech retweetledi

UniPat AI@UniPat_AI·13 Oca

Can frontier MLLMs see like a 3-year-old? We’re releasing BabyVision — a vision-centric benchmark that isolates pre-linguistic visual primitives kids solve effortlessly, but models still struggle with.👇

English

2.2K

Antoine Miech@antoine77340·20 Ara

Antigravity can now leverage the Gemini 3 Flash browser use capability to complete even more sophisticated task! This demo showcases the power of 3 tightly integrated frontier features: 💻 Native Computer Use 👁️ Complex Visual Understanding 🧠 Long-range Agentic Reasoning

Varun Mohan@_mohansolo

Antigravity's computer use has also been massively upgraded with Gemini 3 Flash. It is both faster and better at doing long agentic tasks using the browser. Here's Antigravity doing deep research on the Pareto frontier of models and writing code to visualize the result.

English

2.3K

Antoine Miech@antoine77340·19 Ara

This new amazing capability is enabled starting from Gemini 3 Flash! Give it a try :)

Fei Xia@xf1280

🚀Excited to share that #Gemini 3 Flash can do code execution on images to zoom, count, and annotate visual inputs! The model can choose when to write code to: 🔍 Zoom & Inspect: Detect when details are too small and zoom-in. 🧮 Compute Visually: Run multi-step calculations using code (e.g., summing line items on a receipt). ✏️ Annotate: Draw arrows or bounding boxes to answer questions or show relationships between objects.

English

372

Antoine Miech retweetledi

Jeff Dean@JeffDean·17 Ara

We’ve pushed out the Pareto frontier of efficiency vs. intelligence again. With Gemini 3 Flash ⚡️, we are seeing reasoning capabilities previously reserved for our largest models, now running at Flash-level latency. This opens up entirely new categories of near real-time applications that require complex thought. It’s available in the API, and rolling out today as the default model in AI Mode in Search and Gemini app globally. Read more on the blog at: bit.ly/4pTo5YU More in thread ⬇️

English

193

1.8K

159.3K

Antoine Miech@antoine77340·10 Ara

Excited that Gemini 3 now has strong native computer use capability! I wrote a simple colab showing how you can use Gemini 3.0 for computer use. It teaches how you can give Gemini access to a mouse tool, and let it click on elements from screenshots colab.research.google.com/drive/1OK30EUq…

Google AI Developers@googleaidevs

Gemini 3 Pro is the frontier of multimodal AI, delivering SOTA performance across document, screen, spatial, and video understanding. Read our deep dive on how we’ve pushed our core capabilities to power hero use cases across: + Docs: "derender" complex docs into structured code (HTML/LaTeX) + Screen: build robust computer agents that automate complex tasks + Spatial: generate collision-free trajectories for robotics & XR + Video: analyze sports footage using high-FPS processing with "thinking" mode See how these capabilities are transforming workflows in education, biomedical, and law/finance → goo.gle/3Mt3UlT

English

1.8K

Antoine Miech retweetledi

Google DeepMind@GoogleDeepMind·5 Ağu

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

English

812

2.6K

13.3K

3.7M

Antoine Miech retweetledi

Olivia Moore@omooretweets·21 May

How does Google's new agentic browser (Project Mariner) compare with ChatGPT Operator? I tested them head-to-head, using both platform's suggested prompts (to make it fair!) 👇

English

157

2.1K

712.7K

Antoine Miech retweetledi

MBZ@babaeizadeh·21 May

#Veo3 further blurs the lines between reality and imagination with audio, stronger text adherence, and richer visual details.

English

176

1.4K

739.8K

Antoine Miech retweetledi

Marques Brownlee@MKBHD·20 May

We're barely 2 years from Will Smith eating spaghetti...

Google@Google

Say goodbye to the silent era of video generation: Introducing Veo 3 — with native audio generation. 🗣️ Quality is up from Veo 2, and now you can add dialogue between characters, sound effects and background noise. Veo 3 is available now in the @GeminiApp for Google AI Ultra subscribers in the U.S. #GoogleIO

English

405

3.5K

90.3K

Antoine Miech retweetledi

Antoine Yang@AntoineYang2·9 May

Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog: developers.googleblog.com/en/gemini-2-5-…

English

373

125.4K

Antoine Miech@antoine77340·13 Mar

😵😵

nic@nicdunz

this is so fucking crazy

ART

494

Antoine Miech retweetledi

Logan Kilpatrick@OfficialLoganK·12 Mar

Introducing YouTube video 🎥 link support in Google AI Studio and the Gemini API. You can now directly pass in a YouTube video and the model can usage its native video understanding capabilities to use that, with just a link! 🚢

English

288

370

3.4K

792K

Antoine Miech retweetledi

Antoine Yang@AntoineYang2·13 Mar

You can now paste YouTube links *directly* to use Gemini audio-video understanding on aistudio.google.com 😀

English

150

8.1K

Antoine Miech retweetledi

Robert Riachi@robertriachi·12 Mar

some cool examples with Gemini 2.0 native image output 🧵

English

186

3.9K

481.3K

Antoine Miech retweetledi

Aishwarya Kamath@ashkamath20·12 Mar

Super excited to announce what I’ve been working on for the past few months 💃 GEMMA 3 is out today! It supports 140+ languages, has a context length of 128k tokens and the best part? It’s natively multimodal! 📸

English

346

44K

Antoine Miech retweetledi

Arena.ai@arena·7 Şub

Introducing Arena-Price Plot! 💰📊 An interactive plot of price vs. performance trade-offs for LLMs. Frontier efficiency models: 🔹 Gemini-2.0-Flash/Lite by @GoogleDeepMind 🔹 DeepSeek-R1 by @deepseek_ai 🔹 GPT-4o by @OpenAI 🔹 Yi-Lightning by @01AI_Yi 🔹 Ministral 8B by @MistralAI LLM efficiency is accelerating—kudos to the labs driving the frontier!

English

133

793

85.6K

Antoine Miech retweetledi

Google DeepMind@GoogleDeepMind·16 Ara

Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥 We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through @LabsDotGoogle. → goo.gle/veo-2-imagen-3

English

263

1.3K

6.9K

2.3M

Antoine Miech retweetledi

Mostafa Dehghani@m__dehghani·11 Ara

Gemini2 Flash on the challenge of what the internet has been asking for: breaking down "draw the rest of the owl" into actual steps with interleaved generation. not perfect yet, but it’s on the edge of something super cool...

English

499

105.4K

Keşfet

@xf1280 @anastasija56572 @RohanLikesAI @weichengkuo @bcaine @jalayrac @eisenjulian @phillip_lippe