Cyrus Rashtchian

632 posts

Cyrus Rashtchian banner
Cyrus Rashtchian

Cyrus Rashtchian

@CyrusRashtchian

Research Scientist @GoogleAI working on LLMs, RAG, Factuality, and Gemini (he/him)

San Diego Katılım Aralık 2019
480 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
Cyrus Rashtchian retweetledi
Ching-An Cheng
Ching-An Cheng@chinganc_rl·
Looking for Google research student researcher (PhD student) to work on LLM and agent related learning. Preferred background: RL/game theory, agentic system, LLM training. Candidate will work closely with me and @allenainie Email me if you are interested. 😀
English
11
27
273
46.3K
Cyrus Rashtchian retweetledi
Google Research
Google Research@GoogleResearch·
A fundamental challenge in machine learning, feature selection is NP-hard (i.e., a problem that is mathematically "impossible" to solve perfectly and quickly for large groups of data), which makes it a highly challenging area of research. We introduce Sequential Attention, an algorithm that optimizes subset selection in large-scale ML models. This is an effective technique for multiple large-scale subset selection problems in deep learning and plays a key role in model architecture optimization. As these techniques evolve, they will solidify the future of machine learning.
GIF
English
11
66
557
34.4K
Cyrus Rashtchian retweetledi
Google Research
Google Research@GoogleResearch·
In 2025, the magic cycle of research accelerated. Google Research teams delivered pioneering breakthroughs and brought our research to reality with impact on products, science, and society. More from @ymatias, VP and Head of Google Research →goo.gle/499MMJQ
Google Research tweet media
English
22
129
731
179.3K
Cyrus Rashtchian
Cyrus Rashtchian@CyrusRashtchian·
Come work with us and write an awesome paper in 2026! Please fill out the form!
Tu Vu@tuvllms

We are hiring at @Google! 🚀 Looking for student researchers for Summer 2026 who are excited about the next frontier of AI research. If you are into: multi-agent AI systems 🤖 RAG & factuality ✅ prompt optimization ⚡️ self-improving AI agents 🔄 please fill out this form 👇 forms.gle/1abMhn8fXQiWXg… @Google @GoogleDeepMind #AI #LLMs #internships

English
5
6
120
19.6K
Sonia Joseph
Sonia Joseph@soniajoseph_·
I now run Meta's interpretability community-- 150+ researchers across FAIR, GenAI, & Reality Labs. Building our 2026 interp speaker series (w/ sessions on video models, world models, & causal discovery). At NeurIPS this week. If you're working on these areas, let's talk.
English
9
10
254
20.8K
Cyrus Rashtchian retweetledi
Arya Mazumdar
Arya Mazumdar@MountainOfMoon·
ITA 2026 workshop is Feb 8-13.
English
2
9
29
14.7K
Cyrus Rashtchian retweetledi
Jiao Sun
Jiao Sun@sunjiao123sun_·
🆕Drop from Gemini VibeCoding Team: Why does code functionality correctness NOT necessarily translate to 𝗯𝗲𝘁𝘁𝗲𝗿 𝘂𝘀𝗲𝗿 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗳𝗼𝗿 𝗰𝗼𝗱𝗶𝗻𝗴 tasks? In 𝗩𝗶𝗯𝗲𝗖𝗵𝗲𝗰𝗸𝗲𝗿: arxiv.org/abs/2510.07315, we find that the human preference correlates the best with a composite score of functional correctness and instruction following capability. To evaluate the instruction following capability of LLM for coding, we propose a taxonomy of 𝟯𝟬 𝘃𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗰𝗼𝗱𝗶𝗻𝗴 𝗶𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀 together with their corresponding deterministic checkers. We evaluated 31 leading LLMs and show that even the strongest models struggle to comply with multiple instructions and exhibit clear functional regression! Please take a look at our paper and let us know how you like it! We also look forward to seeing the community extend the work on real user coding prompts to understand how we can improve 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝘂𝘀𝗲𝗿 𝗰𝗼𝗱𝗶𝗻𝗴 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲, let us know if you want to collaborate on these topics! (0/1)
Jiao Sun tweet media
English
3
27
163
18.8K
Cyrus Rashtchian retweetledi
Yossi Matias
Yossi Matias@ymatias·
We need to keep pushing for factual accuracy in LLMs. SLED is a new decoding approach from Google Research that uses all of an LLM's internal layers, instead of just the last one, to better align the output with the model's intrinsic knowledge. This enhances accuracy without retraining or external tools. The research was featured at NeurIPS 2024 and the code is open-source. Learn more about the technical methodology: research.google/blog/making-ll… Paper: arxiv.org/abs/2411.02433 Github: github.com/JayZhang42/SLED
Google Research@GoogleResearch

SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz

English
1
3
6
808
Almog Tavor
Almog Tavor@almogtavor_·
first of all that's a paper from 2024 btw. this is a really strong idea for reducing llm hallucinations though. instead of trusting the final layer, SLED polls all layers (aggregates) and averages their predictions, letting the model override a popular-but-wrong answer when earlier layers point to the right one. no fine-tuning or rag, works on open models, and adds barely any latency. really an elegant way to improve factuality, in cases where the true intuition actually does hide inside
English
1
0
3
263
Cyrus Rashtchian retweetledi
Google Research
Google Research@GoogleResearch·
SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz
Google Research tweet media
English
7
21
150
18.2K
Cyrus Rashtchian
Cyrus Rashtchian@CyrusRashtchian·
🚨 New blog post out on how to make LLMs more factual! We put together nice animations to show how our decoding method works under the hood! 🌟SLED leads to >10% improvements, out of the box, even for newer LLMs like Gemma 3 and GPT-OSS Code: jayzhang42.github.io/sled_page/
Google Research@GoogleResearch

SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz

English
0
0
2
263
Cyrus Rashtchian retweetledi
Azalia Mirhoseini
Azalia Mirhoseini@Azaliamirh·
Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny (400MB) model to use as a standalone verifier! Led by @JonSaadFalcon, @ekellbuch, @MayeeChen, with the great @HazyResearch and team!
Azalia Mirhoseini tweet media
English
3
45
226
19K
Cyrus Rashtchian retweetledi
Ryan Marten
Ryan Marten@ryanmart3n·
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data scales. Full details are in our ✨new paper✨ - below we share the highlights: BTW, it also works on non-Qwen models😉 (1/N)
Ryan Marten tweet media
English
34
192
928
200.1K
Cyrus Rashtchian retweetledi
Tu Vu
Tu Vu@tuvllms·
✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s core set): most chat models (incl. GPT-4.1 w/ browsing) achieve near-zero accuracy ➡️ Advanced reasoning models (e.g., DeepSeek-R1) can be highly vulnerable to noisy search results ➡️ More test-time compute does not yield reliable gains: o-series models often plateau or decline early ➡️ "Lost-in-the-middle" is less of an issue, but models still fail to reliably identify relevant docs amid distractors 📜: arxiv.org/abs/2506.01062 🤗: huggingface.co/datasets/vtllm… 🧵:👇
Tu Vu tweet mediaTu Vu tweet media
English
4
40
146
17.3K
Cyrus Rashtchian retweetledi
Philipp Schmid
Philipp Schmid@_philschmid·
Here is my 2 hour long workshop i just finished at the @aiDotEngineer World's fair. This is all you need to know to learn on how to use Gemini 2.5! It is beginner friendly from getting your first API key to multimodality, function calling and MCP. 🆓 Completely free - runs entirely on free tier 🎯 4 comprehensive modules covering all Gemini 2.5 capabilities 🖼️ Multimodal: images, audio, video, and document understanding 🔊 Multimodal: images, audio generation 🔧 Structured outputs, function calling, and native tools 🤖 Build Model Context Protocol Agents 📚 Ready-to-run Colab notebooks with complete solutions provided
Philipp Schmid tweet media
English
9
32
246
17.4K
Cyrus Rashtchian retweetledi
elvis
elvis@omarsar0·
New Lens on RAG Systems RAG systems are more brittle than you think, even when provided sufficient context. Great work from Google and collaborators. Good tips for devs included. Here are my notes:
elvis tweet media
English
33
228
1.5K
190.2K
Cyrus Rashtchian retweetledi
Ben Dickson
Ben Dickson@bendee983·
LLM often struggle to figure out if they have enough context to answer a question or abstain. And using RAG can cause further confusion, as irrelevant information can put the model off. I spoke to @CyrusRashtchian on "sufficient context," a new technique to figure out whether the model should answer a query or not.
VentureBeat@VentureBeat

Why enterprise RAG systems fail: Google study introduces 'sufficient context' solution venturebeat.com/ai/why-enterpr…

English
0
2
5
376