Alexandre Ramé

829 posts

Alexandre Ramé

@ramealexandre

Research scientist @GoogleDeepMind. Previously PhD @Sorbonne_Univ_. Post-training Gemma LLMs: distillation, RL and merging.

Присоединился Mayıs 2011

773 Подписки1.9K Подписчики

Закреплённый твит

Alexandre Ramé@ramealexandre·12 Mar

Welcome Gemma 3, our new open-weight LLM from @GoogleDeepMind. All sizes (1B, 4B, 12B and 27B) excel on benchmarks, but the key result may be the 27B reaching 1338 on LMSYS. For this, we scaled post-training, with our novel distillation, RL and merging strategies. Happy building!

English

202

20.6K

Alexandre Ramé ретвитнул

MLIA@mlia_isir·6d

📢Yet another great news with the announcement of Mustafa Shukor's PhD defense! 🔥 📖Title: “Efficient and Scalable Multimodal Learning.” 📅Date: Tuesday, March 24, 2026 ⌚️Time: 2:30 p.m. CET 📍Location: Herpin amphitheater

English

458

Alexandre Ramé ретвитнул

Xidong Feng@Xidong_Feng·16 Mar

We've witnessed a crazy concurrent line of work on on-policy self-distillation in LLMs, and I truly believe this is the next paradigm of RL. Back in 2024, we proposed this exact conceptual shift in our paper, Natural Language Reinforcement Learning (NLRL). The real breakthrough here isn't just the specific distillation mechanics. It’s that RL is fundamentally shifting away from the traditional "sample -> then filter or amplify" approach. Instead of passively waiting to stumble upon a good action to upweight, the field is moving toward true synthetic language data generation from experience, which enables true continual learning. You can see this exact recipe playing out across all the recent hit papers: • RLTF (2602.02482): Text critiques as privileged info • OPSD (2601.18734): Ground-truth solutions • SDPO (2601.20802): Runtime errors & execution feedback • ERL(2602.13949): Self-reflections & demonstrations Instead of just using a scalar reward to filter bad rollouts, they all use language feedback to explicitly generate a corrected, high-quality trajectory in hindsight, and then distill that competence back into the base policy. While the specific ways we adapt RL to LLMs are still rapidly evolving, the core vision we outlined in NLRL holds true today: a single scalar is simply too poor of a carrier for credit assignment. When people talk about "experiential memory" for agents today, they are essentially describing what we framed as a Language Value Function (LVF)—not just RAG over past episodes, but storing the structured, strategy-level "why" behind what worked. And what we called "Language Policy Improvement" is exactly this feedback-aware self-distillation loop we see everywhere now. Language, not scalars, is the future of RL. 📄 Check out our early exploration of this framework here: arxiv.org/abs/2411.14251

English

203

31.5K

Alexandre Ramé ретвитнул

Alexander Lerchner@AlexLerchner·13 Mar

🧵1/4 The debate over AI sentience is caught in an "AI welfare trap." My new preprint argues computational functionalism rests on a category error: the Abstraction Fallacy. AI can simulate consciousness, but cannot instantiate it. philpapers.org/rec/LERTAF

English

268

99.8K

Alexandre Ramé ретвитнул

fly51fly@fly51fly·12 Mar

[CL] Think Before You Lie: How Reasoning Improves Honesty A Yuan, A Ghandeharioun, C Blum, A Machado… [Google DeepMind] (2026) arxiv.org/abs/2603.09957

English

2.1K

Alexandre Ramé ретвитнул

templar@tplr_ai·10 Mar

We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

English

211

956

6.2K

1.8M

Alexandre Ramé ретвитнул

Jeff Dean@JeffDean·3 Mar

⚡ Excited to announce Gemini 3.1 Flash-Lite! We’ve set a new standard for efficiency and capability to give developers our fastest, most cost-effective Gemini 3 model yet. We engineered this model with thinking levels, allowing it to handle high-volume queries instantly, while scaling up its reasoning for complex edge cases. By the numbers: ⏱️ 2.5X faster time-to-first-token than 2.5 Flash while being significantly higher quality 📉 $0.25 per 1M input tokens 📊 1432 Elo on LMArena & 86.9% on GPQA Diamond Thrilled to see what developers build with this kind of speed and quality at scale. Available now in Google AI Studio and Vertex AI. blog.google/innovation-and…

English

122

1.3K

116.7K

Alexandre Ramé ретвитнул

Fabien@Fabien_Mikol·2 Mar

Superbe vidéo sur "l'apprentissage subliminal" entre LLM : quand le fine-tuning d'un modèle à partir de données engendrées par un autre peut transmettre des traits et préférences de ce dernier, même quand les signaux transmis nous paraissent sémantiquement totalement neutres.

Owain Evans@OwainEvans_UK

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

Français

3.9K

Alexandre Ramé ретвитнул

Andreas Kirsch 🇺🇦@BlackHC·28 Şub

@pmddomingos It's crazy how you're so consistently wrong in all your takes I see on my timeline. Would you mind starting to give financial advice? I foresee another reverse-Kramer here with great return potential

English

131

4.6K

Alexandre Ramé ретвитнул

Jeff Sebo@jeffrsebo·27 Şub

Google and OpenAI employees who prefer to live to fight another day can sign here! notdivided.org

Jeff Sebo@jeffrsebo

Demis and Sam, now is the time to stand with Dario. You can pick up the race next week.

English

193

9.2K

Alexandre Ramé ретвитнул

Jeff Sebo@jeffrsebo·27 Şub

Demis and Sam, now is the time to stand with Dario. You can pick up the race next week.

Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

1.1K

60.9K

Alexandre Ramé ретвитнул

Rishabh Agarwal@agarwl_·27 Şub

I gave a guest lecture at McGill about scaling RL for LLMs, sharing the slides here. drive.google.com/file/d/1kM25WY…

English

159

1.3K

158.1K

Alexandre Ramé ретвитнул

Hao Sun - RL@HolarisSun·25 Şub

Our team is hiring an RS working on RL, apply and join us!

Edward Grefenstette@egrefen

📣 Announcing a second, different job posting from yesterday's. The @GoogleDeepMind Autonomous Agents team is seeking to hire a Research Scientist to work on established projects on sample efficient learning and robust self-improvement. Details and link below. [1/5]

English

10K

Alexandre Ramé ретвитнул

Ye Zhu@szyezhu·25 Şub

Tenure-track Assistant Professor Opening at École Polytechnique, France #FacultyOpening #TenureTrackAP #LLMs #ComputerScience The Computer Science Department at École Polytechnique is hiring a tenure-track Assistant Professor focusing on Robust and efficient Deep Learning with applications to Large Language Model. Please find the detailed job description at the link below: orailix.com/files/position…

Français

3.2K

Alexandre Ramé ретвитнул

Edward Grefenstette@egrefen·25 Şub

📢 The Autonomous Agents team at @GoogleDeepMind is seeking to hire one research scientist to work on human-centric reward modelling and learning from human interaction, as part of an established and successful line of research projects. Link at the end of this thread. [1/5]

English

257

32.7K

Alexandre Ramé ретвитнул

Martin Klissarov@MartinKlissarov·19 Şub

In the limit, what's important is our ability to adapt. What is a good recipe for teaching agents to adapt on-the-fly? We introduce two meta-learning for LLMs papers written with @JonnyCoook at @GoogleDeepMind. This is research from last year we can finally share 🧵👇

English

247

50.2K

Alexandre Ramé ретвитнул

Thomas Kleine Buening@thomasklbg·19 Şub

Deployed LLMs and users generate millions of conversations every day. These are full of useful learning signals, yet we don't use them for training. We introduce self-distillation for learning directly from user conversations – no rewards, no labels, no extra models.

English

253

51.8K

Alexandre Ramé ретвитнул

Chenlu Ye@ye_chenlu·19 Şub

1/5 Happy CNY🎊 Still bothered by RL off-policy instability in LLM? Introducing a new way💡Adaptive Layerwise Perturbation (ALP)💡, a simple but robust fix that outperforms GRPO/MIS/Bypass, achieves better stability (KL, entropy) and exploration! 🔗 Blog: beneficial-curiosity-d98.notion.site/Adaptive-Layer…

English

145

23.9K

Alexandre Ramé ретвитнул

Guohao Li 🐫@guohao_li·18 Şub

The on-policy cross-stage distillation in GLM-5 by @Zai_org is quite interesting as well. When @thinkymachines released their blog on on-policy distillation, I also shared some thoughts on how to use on-policy distillation for catastrophic forgetting, especially for omni models. My prediction is multi-teacher on-policy distillation will become a standard for training omni models in 2026 Checkout our original post: x.com/guohao_li/stat… And a very old paper we published pre-LLM era on multi-teach imitation learning: arxiv.org/pdf/1803.01129

Guohao Li 🐫@guohao_li

The GLM-5 technical report is an impressive read. In the terminal environment section, their methodology is very similar to our SETA project: starting from seed tasks to draft terminal-task specifications, then building Docker environments and validating them with test scripts. They’ve also scaled this pipeline to generate thousands of environments using this approach. Thanks @Zai_org for sharing such a detailed report. If any of you are interested in building open source terminal environments, do also check out our 1376 environments and blog here: GitHub: github.com/camel-ai/seta-… Blog: camel-ai.org/blogs/seta-sca…

English

121

10.8K

Alexandre Ramé ретвитнул

Brendan McCord 🏛️ x 🤖@Brendan_McCord·18 Şub

The AI community is independently rediscovering that pure reward optimization is insufficient and that something like reflective self-formation is needed for durable learning. This rhymes with an idea from philosophy of "Bildung," or self-formation of the whole person through engagement with the world. The loop they use here (experience → reflection → conceptualization → revised action → internalization) is explicitly drawn from Kolb, who draws from Dewey, who draws from Hegel, who wrote about Bildung. The detail that matters most is the internalization step. With this technique ERL, reflection-guided improvements get distilled into the base policy so the agent can act well without scaffolding at deployment. Similar to the Bildung idea that genuine formation becomes part of your character. Where could this go from here? The variant of Bildung I like most, that of Wilhelm von Humboldt, is formation toward no externally predetermined end. The person unfolds toward their own unique completeness, and the encounter with the world transforms what they're trying to become. ERL, by contrast, optimizes toward a fixed, externally specified reward function. The agent never revises its ends, only its means. Humboldt would call this Ausbildung (professional training) rather than Bildung (self-formation). The reflection in ERL is entirely instrumental ("how do I get more reward?") and never the broader "what should I be trying to do?" Related: Bildung requires genuine Freiheit (freedom) and open-endedness. The agent in ERL operates in closed environments with crisp success criteria. There's no possibility of the Sokoban agent in the paper deciding that pushing boxes is meaningless and it wants to write poetry instead. The "self" that gets formed is always already circumscribed by the task specification. And the "internalization" mechanism, while structurally elegant, is really behavioral cloning of successful outputs. The agent learns to reproduce improved behavior, not to understand why the improvement matters. In Bildung, internalization transforms your relationship to the world. This is a great "philosophy to code" paper. I would encourage @taiwei_shi and the other authors to read Humbolt to ideate on further directions, e.g., from mechanization of the reflective loop to mechanization of the question of what the reflection is for.

Taiwei Shi@taiwei_shi

For decades, we’ve trained AI to chase rewards. But humans don’t just optimize outcomes. We experience, reflect, then learn. Can AI do the same? Introducing 𝐄𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐭𝐢𝐚𝐥 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠, a step toward AI that truly learn from experience.

English

10K

Alexandre Ramé ретвитнул

Quentin Berthet@qberthet·16 Şub

🚨 🔬 PhD positions at Google DeepMind in France 🇫🇷 We are advertising Master Level Intern positions at Google DeepMind within our Frontier AI Unit. These could lead to co-advised PhD positions with Google DeepMind and French academic institutions. job-boards.greenhouse.io/deepmind/jobs/…

English

593

52.2K

Открыть

@pmddomingos @GoogleDeepMind @JonnyCoook @Zai_org @thinkymachines @taiwei_shi @elonmusk @BarackObama