Cyrus Rashtchian

632 posts

Cyrus Rashtchian

@CyrusRashtchian

Research Scientist @GoogleAI working on LLMs, RAG, Factuality, and Gemini (he/him)

San Diego Katılım Aralık 2019

480 Takip Edilen1.3K Takipçiler

Sabitlenmiş Tweet

Cyrus Rashtchian@CyrusRashtchian·15 May

🚨New blog post out! It's really well written with great graphics 😉😎 About sufficient context in #LLMs for #RAG, based on the project from our amazing intern @HaileyJoren !

Google AI@GoogleAI

Today on the blog we introduce a notion of sufficient context to examine retrieval augmented generation (RAG) systems, developing a method to classify instances, analyzing failures of RAG systems & proposing a way to reduce hallucinations. Read more →goo.gle/43gp3Vk

English

2.2K

Cyrus Rashtchian retweetledi

Ching-An Cheng@chinganc_rl·2d

Looking for Google research student researcher (PhD student) to work on LLM and agent related learning. Preferred background: RL/game theory, agentic system, LLM training. Candidate will work closely with me and @allenainie Email me if you are interested. 😀

English

273

46.3K

Cyrus Rashtchian retweetledi

Google Research@GoogleResearch·5 Şub

A fundamental challenge in machine learning, feature selection is NP-hard (i.e., a problem that is mathematically "impossible" to solve perfectly and quickly for large groups of data), which makes it a highly challenging area of research. We introduce Sequential Attention, an algorithm that optimizes subset selection in large-scale ML models. This is an effective technique for multiple large-scale subset selection problems in deep learning and plays a key role in model architecture optimization. As these techniques evolve, they will solidify the future of machine learning.

GIF

English

557

34.4K

Cyrus Rashtchian@CyrusRashtchian·23 Ara

Proud to have contributed to several of these research accomplishments and product launches 😊 looking forward to doing more cool things in 2026 🚀

Google Research@GoogleResearch

In 2025, the magic cycle of research accelerated. Google Research teams delivered pioneering breakthroughs and brought our research to reality with impact on products, science, and society. More from @ymatias, VP and Head of Google Research →goo.gle/499MMJQ

English

357

Cyrus Rashtchian retweetledi

Google Research@GoogleResearch·19 Ara

English

129

731

179.3K

Cyrus Rashtchian@CyrusRashtchian·16 Ara

Come work with us and write an awesome paper in 2026! Please fill out the form!

Tu Vu@tuvllms

We are hiring at @Google! 🚀 Looking for student researchers for Summer 2026 who are excited about the next frontier of AI research. If you are into: multi-agent AI systems 🤖 RAG & factuality ✅ prompt optimization ⚡️ self-improving AI agents 🔄 please fill out this form 👇 forms.gle/1abMhn8fXQiWXg… @Google @GoogleDeepMind #AI #LLMs #internships

English

120

19.6K

Cyrus Rashtchian@CyrusRashtchian·5 Ara

@soniajoseph_ We have a mech interp / causal analysis spotlight poster Friday morning sesh #1012 if ppl are interested in "human-like" multi-hop logic circuits in LLMs up to 27B neurips.cc/virtual/2025/l…

English

510

Sonia Joseph@soniajoseph_·4 Ara

I now run Meta's interpretability community-- 150+ researchers across FAIR, GenAI, & Reality Labs. Building our 2026 interp speaker series (w/ sessions on video models, world models, & causal discovery). At NeurIPS this week. If you're working on these areas, let's talk.

English

254

20.8K

Cyrus Rashtchian retweetledi

Arya Mazumdar@MountainOfMoon·26 Eki

ITA 2026 workshop is Feb 8-13.

English

14.7K

Cyrus Rashtchian retweetledi

Jiao Sun@sunjiao123sun_·10 Eki

🆕Drop from Gemini VibeCoding Team: Why does code functionality correctness NOT necessarily translate to 𝗯𝗲𝘁𝘁𝗲𝗿 𝘂𝘀𝗲𝗿 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗳𝗼𝗿 𝗰𝗼𝗱𝗶𝗻𝗴 tasks? In 𝗩𝗶𝗯𝗲𝗖𝗵𝗲𝗰𝗸𝗲𝗿: arxiv.org/abs/2510.07315, we find that the human preference correlates the best with a composite score of functional correctness and instruction following capability. To evaluate the instruction following capability of LLM for coding, we propose a taxonomy of 𝟯𝟬 𝘃𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗰𝗼𝗱𝗶𝗻𝗴 𝗶𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀 together with their corresponding deterministic checkers. We evaluated 31 leading LLMs and show that even the strongest models struggle to comply with multiple instructions and exhibit clear functional regression! Please take a look at our paper and let us know how you like it! We also look forward to seeing the community extend the work on real user coding prompts to understand how we can improve 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝘂𝘀𝗲𝗿 𝗰𝗼𝗱𝗶𝗻𝗴 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲, let us know if you want to collaborate on these topics! (0/1)

English

163

18.8K

Cyrus Rashtchian retweetledi

Yossi Matias@ymatias·18 Eyl

We need to keep pushing for factual accuracy in LLMs. SLED is a new decoding approach from Google Research that uses all of an LLM's internal layers, instead of just the last one, to better align the output with the model's intrinsic knowledge. This enhances accuracy without retraining or external tools. The research was featured at NeurIPS 2024 and the code is open-source. Learn more about the technical methodology: research.google/blog/making-ll… Paper: arxiv.org/abs/2411.02433 Github: github.com/JayZhang42/SLED

Google Research@GoogleResearch

SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz

English

808

Cyrus Rashtchian@CyrusRashtchian·17 Eyl

@aisaac__newton @GoogleResearch Thanks for the nice summary! Exactly right!

English

Almog Tavor@almogtavor_·17 Eyl

first of all that's a paper from 2024 btw. this is a really strong idea for reducing llm hallucinations though. instead of trusting the final layer, SLED polls all layers (aggregates) and averages their predictions, letting the model override a popular-but-wrong answer when earlier layers point to the right one. no fine-tuning or rag, works on open models, and adds barely any latency. really an elegant way to improve factuality, in cases where the true intuition actually does hide inside

English

263

Cyrus Rashtchian retweetledi

Google Research@GoogleResearch·17 Eyl

English

150

18.2K

Cyrus Rashtchian@CyrusRashtchian·17 Eyl

🚨 New blog post out on how to make LLMs more factual! We put together nice animations to show how our decoding method works under the hood! 🌟SLED leads to >10% improvements, out of the box, even for newer LLMs like Gemma 3 and GPT-OSS Code: jayzhang42.github.io/sled_page/

Google Research@GoogleResearch

English

263

Cyrus Rashtchian retweetledi

Azalia Mirhoseini@Azaliamirh·26 Haz

Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny (400MB) model to use as a standalone verifier! Led by @JonSaadFalcon, @ekellbuch, @MayeeChen, with the great @HazyResearch and team!

English

226

19K

Cyrus Rashtchian retweetledi

Ryan Marten@ryanmart3n·5 Haz

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data scales. Full details are in our ✨new paper✨ - below we share the highlights: BTW, it also works on non-Qwen models😉 (1/N)

English

192

928

200.1K

Cyrus Rashtchian retweetledi

Tu Vu@tuvllms·3 Haz

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s core set): most chat models (incl. GPT-4.1 w/ browsing) achieve near-zero accuracy ➡️ Advanced reasoning models (e.g., DeepSeek-R1) can be highly vulnerable to noisy search results ➡️ More test-time compute does not yield reliable gains: o-series models often plateau or decline early ➡️ "Lost-in-the-middle" is less of an issue, but models still fail to reliably identify relevant docs amid distractors 📜: arxiv.org/abs/2506.01062 🤗: huggingface.co/datasets/vtllm… 🧵:👇

English

146

17.3K

Cyrus Rashtchian retweetledi

Philipp Schmid@_philschmid·4 Haz

Here is my 2 hour long workshop i just finished at the @aiDotEngineer World's fair. This is all you need to know to learn on how to use Gemini 2.5! It is beginner friendly from getting your first API key to multimodality, function calling and MCP. 🆓 Completely free - runs entirely on free tier 🎯 4 comprehensive modules covering all Gemini 2.5 capabilities 🖼️ Multimodal: images, audio, video, and document understanding 🔊 Multimodal: images, audio generation 🔧 Structured outputs, function calling, and native tools 🤖 Build Model Context Protocol Agents 📚 Ready-to-run Colab notebooks with complete solutions provided

English

246

17.4K

Cyrus Rashtchian retweetledi

elvis@omarsar0·28 May

New Lens on RAG Systems RAG systems are more brittle than you think, even when provided sufficient context. Great work from Google and collaborators. Good tips for devs included. Here are my notes:

English

228

1.5K

190.2K

Cyrus Rashtchian retweetledi

Ben Dickson@bendee983·23 May

LLM often struggle to figure out if they have enough context to answer a question or abstain. And using RAG can cause further confusion, as irrelevant information can put the model off. I spoke to @CyrusRashtchian on "sufficient context," a new technique to figure out whether the model should answer a query or not.

VentureBeat@VentureBeat

Why enterprise RAG systems fail: Google study introduces 'sufficient context' solution venturebeat.com/ai/why-enterpr…

English

376

Cyrus Rashtchian@CyrusRashtchian·23 May

🚨Nice coverage of our Sufficient Context work for RAG systems! 🙌🏼 "This approach makes it possible to determine if an LLM has enough information to answer a query accurately, a critical factor for developers building real-world enterprise applications..." ⬇️

VentureBeat@VentureBeat

Why enterprise RAG systems fail: Google study introduces 'sufficient context' solution venturebeat.com/ai/why-enterpr…

English

280

Cyrus Rashtchian@CyrusRashtchian·23 May

Jacob is a great researcher and mentor! Apply away!

Jacob Eisenstein@jacobeisenstein

We're hiring a research scientist on the Foundational Research in Language team at GDM. The role is right here in sunny Seattle! job-boards.greenhouse.io/deepmind/jobs/…

English

385

Keşfet

@allenainie @ymatias @soniajoseph_ @GoogleResearch @JonSaadFalcon @ekellbuch @MayeeChen @HazyResearch