Avinab Saha 🇮🇳

351 posts

Avinab Saha 🇮🇳

@avinab_saha

Research Scientist @GoogleResearch | PhD Student, LIVE @UTAustin @utexasece @MLFoundations | Formerly at @Apple, @samsungresearch | @IITKgp '19

Mountain View, CA Katılım Ocak 2012

1.9K Takip Edilen640 Takipçiler

Sabitlenmiş Tweet

Avinab Saha 🇮🇳@avinab_saha·1 Haz

Excited to share our recent work accepted to @siggraph 2025! 🎉 📄 FaceExpressions-70k: We introduce the first large-scale public dataset of realistic human faces annotated with perceived expression difference scores, enabling new research in facial expression perception.

English

1.5K

Avinab Saha 🇮🇳@avinab_saha·6d

looks really cool

Gordon Wetzstein@GordonWetzstein

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

English

130

Avinab Saha 🇮🇳 retweetledi

Stefano Ermon@StefanoErmon·24 Şub

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English

321

587

4.2K

991.1K

Avinab Saha 🇮🇳 retweetledi

Noam Shazeer@NoamShazeer·12 Şub

An updated Gemini 3 Deep Think is out today: 📈 Achieves SOTA on ARC-AGI-2, MMMU-Pro, and HLE. 🥇Gold-medal level on Physics & Chemistry Olympiads. It turns out the best way to solve hard problems is still to think about them. Read more: bit.ly/4kzBLqq

English

117

1.2K

109.5K

Avinab Saha 🇮🇳 retweetledi

Siyan Zhao@siyan_zhao·22 Oca

Introducing 💡On-Policy Self-Distillation💡, a simple method that enables LLM to teach itself with dense per-token feedback on its own on-policy generations—achieving 4-8x more token efficiency vs. GRPO and outperforming both GRPO and SFT/Off-Policy Distillation. Key insight: like a student reviewing solutions, rationalizing them, and correcting prior mistakes, an LLM can be conditioned on privileged info (e.g., correct solution or a reasoning trace) and supervise its weaker self—the version without such access—by matching the privileged-info-induced distribution from itself. 🌐Blog: siyan-zhao.github.io/blog/2026/opsd/ 🧵👇

English

157

921

131.7K

Avinab Saha 🇮🇳@avinab_saha·29 Oca

Gemini is truly democratizing education. Love to see it.

Josh Woodward@joshwoodward

🇮🇳 Good morning India! A lot of you asked for full-length mock JEE Main tests in @GeminiApp at no cost - done! Good luck on your prep! Last week, SAT. This week, JEE. What other global exams would be most helpful?

English

Avinab Saha 🇮🇳 retweetledi

Google@Google·21 Oca

We’re launching full-length, on demand practice exams for standardized tests in @GeminiApp, starting with the SAT, available now at no cost. Practice SATs are grounded in rigorously vetted content in partnership with @ThePrincetonRev, and Gemini will provide immediate feedback highlighting where you excelled and where you might need to study more. To try it out, tell Gemini, “I want to take a practice SAT test.”

English

696

2.7K

22.9K

6.3M

Avinab Saha 🇮🇳 retweetledi

Google@Google·14 Oca

Today, we’re introducing Personal Intelligence. With your permission, Gemini can now securely connect information from Google apps like @Gmail, @GooglePhotos, Search and @YouTube history with a single tap to make Gemini uniquely helpful & personalized to *you* ✨ This feature is launching in beta today in the @GeminiApp. See Personal Intelligence in action 🧵 ↓

GIF

English

748

7.5K

4.3M

Avinab Saha 🇮🇳 retweetledi

Josh Woodward@joshwoodward·12 Oca

Crossed 1 billion images Nano Banana Pro images in @GeminiApp! The pro community is moving fast. This model has been out for 53 days. Come for the potassium, stay for more. :)

GIF

English

102.1K

Avinab Saha 🇮🇳 retweetledi

Aakash Kumar Nain@A_K_Nain·22 Ara

I have just finished reading the "Next-Embedding Prediction Makes Strong Vision Learners" paper. Here is a summary if you are interested 👇

English

392

51.1K

Avinab Saha 🇮🇳 retweetledi

Yushi Hu@huyushi98·19 Ara

Reward models make or break post-training for multimodal omni models (e.g., nano banana), yet there’s surprisingly little research on that‼️ We’re releasing MMRB2: new reward benchmark focusing on omni models, spanning T2I, editing, interleaved, and thinking with images 🧵1/n

English

156

33.9K

Avinab Saha 🇮🇳@avinab_saha·10 Ara

Found some time to read this!

Xiang Yue@xiangyue96

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass @128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵

English

168

Avinab Saha 🇮🇳@avinab_saha·4 Ara

Nice!

Nano Banana 2@NanoBanana

It seems like a lovely day in London. Use the prompt below on Nano Banana Pro to make cute images of a location with live weather conditions. Make sure you have search grounding enabled to get the current weather.

English

130

Avinab Saha 🇮🇳@avinab_saha·20 Kas

Here we go!

Google AI@GoogleAI

Rolling out today we are launching Nano Banana Pro, the world’s best image model built to move beyond casual creation and into a new era of studio-quality, functional design. Nano Banana Pro enables a new level of precision and creative control, transforming the way you bring ideas to life. Here are a couple of our favorite new features: — Text rendering and translation: Generate crystal-clear text directly within your images. With the model’s advanced language understanding, you can even translate and regenerate visuals with localized text. — World knowledge: By connecting to Search’s vast knowledge base, Nano Banana Pro generates factually accurate diagrams and realistic product placements, making it an invaluable tool for learning and communication.

English

Avinab Saha 🇮🇳@avinab_saha·14 Kas

State of AI reviews in 2025. Reviewers need to be held responsible, more conferences should adopt practices at @CVPR for reviewer accountability. It will not solve the issue completely, but should help to an extent for sure :)

Peter Richtarik@peter_richtarik

I am an AC for ICLR 2026. One of the papers in my batch was just withdrawn. The authors wrote a brief response, explaining why the reviewers failed at their job. I agree with most of their comments. The authors gave up. They are fed up. Just like many of us. I understand. We pretend the emperor has clothes, but he is naked. Here is the final part of their withdrawal notice. I took the liberty to make it public, to highlight that what we are doing with AI conference reviews these last few years is, basically, madness. --- Comment: We thank the reviewers for their time. However, upon reading the reviews for our paper, it became immediately apparent that the four "reject" ratings are not based on good-faith academic disagreement, but on a critical failure to read the submitted paper. The reviews are rife with demonstrably false claims that are directly contradicted by the text. The core justifications for rejection rely on asserting that key components are "missing" when they are explicitly detailed in the manuscript. Some specific examples are (and many are even fake claims). Claim: Harder tasks like GSM8K are missing. Fact: GSM8K results are in many tables, like Table 2 (Section 4.2) and Appendix G. Claim: The method does not use per-layer ranks. Fact: This is the entire point of our method. The reviewer clearly mistook our method for the baselines. (Section 2, Table 1). Claim: The GP kernel is not specified. Fact: It is specified in Appendix E (Table 6). Claim: There is no ablation of the method's three stages. Fact: Section 4.4 ("Ablation Study") and Appendix J are dedicated to this. Reviewers have a fundamental responsibility to read and evaluate the work they are assigned. The nature of these errors is so fundamental, so systemic in overlooking explicit content, that it goes far beyond what "limited time" or "oversight" can explain. This work has gone through several rounds of revision over the last year. In earlier submissions, the paper usually received borderline or weak-accept scores. Numerous signs strongly suggest that some reviewers are relying entirely on AI tools to automatically generate peer reviews, rather than fulfilling their fundamental responsibility of personally reading and evaluating manuscripts. We strongly protest this. This is a gross disrespect to the authors. It is a flagrant desecration of the reviewer's sacred duty. It fundamentally undermines the integrity of the entire peer-review process. Given that the reviews are not based on the actual content of our paper, we have decided to withdraw the submission. We leave this comment so that future readers of the OpenReview page are aware that the items described as "missing" are already present in the submitted manuscript. These negative reviews for this submission are factually unsound and do not reflect the content of the paper. We cannot and will not accept an assessment that is not based on the work we actually submitted.

English

208

Avinab Saha 🇮🇳@avinab_saha·7 Kas

cool feature rolling out!

Bilawal Sidhu@bilawalsidhu

Google is now using Gemini to cross-reference ~250M places with Street View imagery to identify visible landmarks for turn-by-turn nav. Think iconic buildings, gas stations and restaurants. So instead of "turn right in 500 feet" you get "turn right after the Thai Siam Restaurant" with the landmark highlighted. AI solving the distance estimation problem by using what you can actually see. Rolling out in US.

English

151

Avinab Saha 🇮🇳 retweetledi

Google Arts&Culture@googlearts·7 Eki

Exploring the self-portrait with generative AI. “Self-portrait” is a new video work by artist Ben Cullen Williams in collaboration with Google Arts & Culture and @GoogleDeepMind. Let’s dive into the process. 🧵

English

2.7K

Avinab Saha 🇮🇳@avinab_saha·31 Ağu

Happy to see my research at @GoogleResearch landing into this film! If you are in Las Vegas, go and watch this incredible AI driven creation :)

Sphere@SphereVegas

Get ready to be 𝗯𝗹𝗼𝘄𝗻 away 🌪️ #WOZatSphere 🎥: @AliveCoverage

English

947

Avinab Saha 🇮🇳@avinab_saha·22 Tem

crazy.

Google DeepMind@GoogleDeepMind

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

English

289

Avinab Saha 🇮🇳@avinab_saha·14 Haz

Presenting our @CVPR Spotlight paper, Focus-N-Fix: Region Aware Fine-Tuning for Text-to-Image Generation today at 5pm, Ex Hall D, Poster ID# 259! Hope to see many of you!

English

300

Avinab Saha 🇮🇳@avinab_saha·12 Haz

Final invited talk at our @CVPR workshop on Explainable AI for Computer Vision by Junfeng He, @GoogleAI @Google happening at 415pm, Room 107B! The talk promises to be an exciting one focusing on evaluating and improving AI generated content!

English

1.3K

Keşfet

@GeminiApp @ThePrincetonRev @Gmail @GooglePhotos @YouTube @CVPR @GoogleDeepMind @GoogleResearch