Tweet Disematkan
Rohan Doshi
86 posts

Rohan Doshi
@RohanLikesAI
gemini multimodal, product @ deepmind. view are my own
Bergabung Nisan 2020
203 Mengikuti1.4K Pengikut

RT @swyx: filesystem + code sandbox combo eats another modality.
remember when o3 destroyed at geoguessr?
gemini agentic vision will find…
English

An awesome deep dive on how to leverage Gemini 3 Agentic Vision today
Google AI Developers@googleaidevs
Gemini 3 Flash now uses an agentic "think-act-observe" loop to solve complex visual tasks 🤖 @GoogleDeepMind engineer @ptruiz_dev demonstrates how the model runs Python code automatically to zoom and inspect items, annotate images, and re-visualize data into charts.
English

Back at Harvard Business School last week speaking on frontier AI + agents 🤖
As a ’23 alum, it was energizing to be back - this time teaching from the other side of the classroom
My AI agent workshop was completely packed, with 100+ students - signal on how much Harvard is leaning into AI
The students’ agency, raw IQ, and curiosity left me wildly optimistic about the next wave of AI builders 🚀
Grateful to Profs. Jeffrey Bussgang & Allison Mnookin and the Launching Tech Ventures team for the invite 🙏🏼



English

@RejaullahmdMd Select Gemini 3 Flash as the model, turn on the code execution model, and upload an image!
English

🚀 Excited to officially launch 👁Agentic Vision via Gemini 3 Flash. Gemini can run code execution on image uploads to zoom, analyze, and annotate:
🔍 Zoom: 5-10% quality win across vision benchmarks
🧮 Analyze: do image math with code (e.g. calculate the tip for a receipt)
✏️ Annotate: Draw arrows or bounding boxes to answer questions
Try via the Gemini API (AI Studio / Vertex) or via the Gemini App (rolling out to Thinking mode today).
Learn more→ goo.gle/4bsKdFv
Demo:
goo.gle/3Z05KxK
cc: @IoanaBica95 @anastasija56572 @jalayrac @bcaine @eisenjulian @weichengkuo @phillip_lippe @xf1280 @tulseedoshi @BiboXu @OfficialLoganK
Google AI Developers@googleaidevs
Try 👁 Agentic Vision with Gemini 3 Flash in @GoogleAIStudio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action: goo.gle/3Z05KxK
English

@Jake_Joseph @AdMachineAI Very cool - would love to chat and see if we can help
English

@RohanLikesAI Thinking about using it as a fidelity/quality layer check for @AdMachineAI on our outputs to check vs. input product. If it doesn’t meet a threshold it automatically regenerates.
English

@1littlecoder @shresbm both! AIS docs: #images" target="_blank" rel="nofollow noopener">ai.google.dev/gemini-api/doc…
English

@shresbm @RohanLikesAI is it available inside ai studio or only through api?
English

Agentic Vision in Gemini 3 Flash converts image understanding into an active, agentic investigation. By combining visual reasoning with code execution, the model formulates plans to zoom in, inspect and manipulate images step-by-step, grounding answers in visual evidence.
Below see an example of image
English

@marcosmarf27 @OfficialLoganK our team should be able to help debug things: feel free to respond to my DM or email me at rohandoshi@google.com.
English

@marcosmarf27 @OfficialLoganK can you help me better understand your entire pipeline better? what are "vision resources". Are you using another upstream system to do OCR? And are you feeding the OCR text and the PDF images into Gemini from there?
English
Rohan Doshi me-retweet

Gemini 3 Flash is insane at OCR.
It parses this extremely hard to read handwritten letter by Richard Feynman perfectly. It can do ~300 of these for $1.
What's crazy is Feynman addresses General Donald J. Kutyna as "Katyna" which Gemini gets. There is no "Meeting Katyna", the first part of the letter, in all of Google search!

English

@matidotlol @googleaidevs @OfficialLoganK Gemini Vision PM here! Let's chat (I'll DM you). can you share some queries+PDF examples. I'll have our team debug what's going on
English

@googleaidevs @OfficialLoganK where is the best place to report problems with Gemini API? for whatever reason, when using structured outputs, Gemini 3 Flash just ignores the vast majority of the PDF input. it works fine without structured outputs
English

@marcosmarf27 @OfficialLoganK Hey Marcos! I’m the Gemini Vision PM working on doc understanding. Would love to learn more and see if we can help. Will DM you.
English

@GeminiApp @MarioLucic_ looks like the sports video work paid off 😂
English



