Rohan Doshi

86 posts

Rohan Doshi

@RohanLikesAI

gemini multimodal, product @ deepmind. view are my own

Bergabung Nisan 2020

203 Mengikuti1.4K Pengikut

Tweet Disematkan

Rohan Doshi@RohanLikesAI·18 Kas

🚀 We just launched Gemini 3 Pro — the strongest multimodal understanding model ever built. I lead product for Gemini’s multimodal vision capabilities, and I want to share more about the massive wins we are seeing across document, screen, spatial, and video understanding. 🧵

English

9.4K

Rohan Doshi@RohanLikesAI·12 Mar

RT @swyx: filesystem + code sandbox combo eats another modality. remember when o3 destroyed at geoguessr? gemini agentic vision will find…

English

Rohan Doshi@RohanLikesAI·11 Şub

An awesome deep dive on how to leverage Gemini 3 Agentic Vision today

Google AI Developers@googleaidevs

Gemini 3 Flash now uses an agentic "think-act-observe" loop to solve complex visual tasks 🤖 @GoogleDeepMind engineer @ptruiz_dev demonstrates how the model runs Python code automatically to zoom and inspect items, annotate images, and re-visualize data into charts.

English

3.3K

Rohan Doshi@RohanLikesAI·9 Şub

Back at Harvard Business School last week speaking on frontier AI + agents 🤖 As a ’23 alum, it was energizing to be back - this time teaching from the other side of the classroom My AI agent workshop was completely packed, with 100+ students - signal on how much Harvard is leaning into AI The students’ agency, raw IQ, and curiosity left me wildly optimistic about the next wave of AI builders 🚀 Grateful to Profs. Jeffrey Bussgang & Allison Mnookin and the Launching Tech Ventures team for the invite 🙏🏼

English

1.2K

Rohan Doshi@RohanLikesAI·29 Oca

@hololux Very cool!

English

galal@hololux·28 Oca

@RohanLikesAI

QME

Rohan Doshi@RohanLikesAI·28 Oca

what are y'all building with Gemini Agentic Vision??

English

5.5K

Rohan Doshi@RohanLikesAI·29 Oca

@danielpearson Let’s chat! DM me!

English

Daniel Pearson@danielpearson·28 Oca

@RohanLikesAI Agentic vision for video soon?

English

Rohan Doshi@RohanLikesAI·28 Oca

@RejaullahmdMd Select Gemini 3 Flash as the model, turn on the code execution model, and upload an image!

English

Md Rejaullah@RejaullahmdMd·28 Oca

@RohanLikesAI How to use it in AI studio

English

Rohan Doshi@RohanLikesAI·27 Oca

🚀 Excited to officially launch 👁Agentic Vision via Gemini 3 Flash. Gemini can run code execution on image uploads to zoom, analyze, and annotate: 🔍 Zoom: 5-10% quality win across vision benchmarks 🧮 Analyze: do image math with code (e.g. calculate the tip for a receipt) ✏️ Annotate: Draw arrows or bounding boxes to answer questions Try via the Gemini API (AI Studio / Vertex) or via the Gemini App (rolling out to Thinking mode today). Learn more→ goo.gle/4bsKdFv Demo: goo.gle/3Z05KxK cc: @IoanaBica95 @anastasija56572 @jalayrac @bcaine @eisenjulian @weichengkuo @phillip_lippe @xf1280 @tulseedoshi @BiboXu @OfficialLoganK

Google AI Developers@googleaidevs

Try 👁 Agentic Vision with Gemini 3 Flash in @GoogleAIStudio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action: goo.gle/3Z05KxK

English

237

29.3K

Rohan Doshi@RohanLikesAI·28 Oca

@Jake_Joseph @AdMachineAI Very cool - would love to chat and see if we can help

English

Jake Baumann@Jake_Joseph·28 Oca

@RohanLikesAI Thinking about using it as a fidelity/quality layer check for @AdMachineAI on our outputs to check vs. input product. If it doesn’t meet a threshold it automatically regenerates.

English

103

Rohan Doshi@RohanLikesAI·28 Oca

@1littlecoder @shresbm both! AIS docs: #images" target="_blank" rel="nofollow noopener">ai.google.dev/gemini-api/doc…

English

1LittleCoder💻@1littlecoder·28 Oca

@shresbm @RohanLikesAI is it available inside ai studio or only through api?

English

Shrestha Basu Mallick@shresbm·27 Oca

Agentic Vision in Gemini 3 Flash converts image understanding into an active, agentic investigation. By combining visual reasoning with code execution, the model formulates plans to zoom in, inspect and manipulate images step-by-step, grounding answers in visual evidence. Below see an example of image

English

305

Rohan Doshi@RohanLikesAI·19 Ara

@marcosmarf27 @OfficialLoganK our team should be able to help debug things: feel free to respond to my DM or email me at rohandoshi@google.com.

English

Rohan Doshi@RohanLikesAI·19 Ara

@marcosmarf27 @OfficialLoganK can you help me better understand your entire pipeline better? what are "vision resources". Are you using another upstream system to do OCR? And are you feeding the OCR text and the PDF images into Gemini from there?

English

Rohan Doshi@RohanLikesAI·19 Ara

@deedydas @deedydas 👋 glad you’re a fan of the launch (I’m the Gemini multimodal vision PM) - feel free to DM if you have any feedback for the team on doc understanding

English

Rohan Doshi me-retweet

Deedy@deedydas·19 Ara

Gemini 3 Flash is insane at OCR. It parses this extremely hard to read handwritten letter by Richard Feynman perfectly. It can do ~300 of these for $1. What's crazy is Feynman addresses General Donald J. Kutyna as "Katyna" which Gemini gets. There is no "Meeting Katyna", the first part of the letter, in all of Google search!

English

159

1.7K

177.1K

Rohan Doshi@RohanLikesAI·19 Ara

@matidotlol @googleaidevs @OfficialLoganK cc: @jalayrac @bcaine

Rohan Doshi@RohanLikesAI·19 Ara

@matidotlol @googleaidevs @OfficialLoganK Gemini Vision PM here! Let's chat (I'll DM you). can you share some queries+PDF examples. I'll have our team debug what's going on

English

Matías@matidotlol·19 Ara

@googleaidevs @OfficialLoganK where is the best place to report problems with Gemini API? for whatever reason, when using structured outputs, Gemini 3 Flash just ignores the vast majority of the PDF input. it works fine without structured outputs

English

166

Rohan Doshi@RohanLikesAI·19 Ara

@marcosmarf27 @OfficialLoganK Hey Marcos! I’m the Gemini Vision PM working on doc understanding. Would love to learn more and see if we can help. Will DM you.

English

941

Rohan Doshi@RohanLikesAI·19 Ara

⚡️ Gemini 3 Flash just got a major new capability: code execution for images. Gemini can decide when to write code to zoom, count, and annotate—unlocking the next phase of Agentic Vision. 🤖 Wildly fun to PM this 0→1. Stay tuned to hear more 👀

English

8.8K

Rohan Doshi@RohanLikesAI·18 Ara

@GeminiApp @MarioLucic_ looks like the sports video work paid off 😂

English

360

Google Gemini@GeminiApp·17 Ara

With Gemini 3 Flash, you can upload a short video to get a quick, easy-to-read analysis. Try Gemini 3 Flash in the app today.

English

111

867

3.6M

Jelajahi

@swyx @hololux @danielpearson @RejaullahmdMd @IoanaBica95 @anastasija56572 @jalayrac @bcaine