Skander Moalla

155 posts

Skander Moalla

@SkanderMoalla

RS Intern @ Meta FAIR | PhD @ EPFL, Caglar Gulcehre Lab for AI Research (CLAIRE) | Reinforcement learning, Large Language Model post-training

Lausanne, Switzerland Katılım Ocak 2017

425 Takip Edilen283 Takipçiler

Sabitlenmiş Tweet

Skander Moalla@SkanderMoalla·19 Eyl

See you at #NeurIPS2025! 🙌

Skander Moalla@SkanderMoalla

🚀 Big time! We can finally do LLM RL fine-tuning with rewards and leverage offline/off-policy data! ❌ You want rewards, but GRPO only works online? ❌ You want offline, but DPO is limited to preferences? ✅ QRPO can do both! 🧵Here's how we do it:

English

823

Skander Moalla retweetledi

Charles Arnal@arnal_charles·6d

(1/9) Experience replay can cut LLM RL training compute by up to ~40% (without hurting final accuracy—and sometimes improving it). Paper: arxiv.org/abs/2604.08706

English

195

14.6K

Skander Moalla retweetledi

Giorgia Ramponi@gio_ramponi·10 Nis

I’m looking for one strong RL intern (4 months, starting in September) with the intention of becoming a PhD position. The position is based in Zurich, a fantastic research environment with an active ML/AI community. info here: lnkd.in/dj2J-Z-f

English

389

29.7K

Skander Moalla retweetledi

Charles Arnal@arnal_charles·3 Nis

TL;DR: As a pilot we formalized an entire math textbook in Lean 4 with an agentic pipeline. Only the beginning—next up: more books and more insights!

Fabian Gloeckle@FabianGloeckle

A new milestone in automatic formalization: We translated an entire graduate math textbook into Lean using 30K LLM agents. Open-source, large-scale multi-agent inference that actually works > Blueprint+Lean: faabian.github.io/algebraic-comb… > Codebase+preprint: github.com/facebookresear… 1/7

English

945

Skander Moalla@SkanderMoalla·2 Nis

@johanferret Impressive! 🎉🎉

English

Johan Ferret@johanferret·2 Nis

Gemma 4 is out! 💎💎💎💎 Among many things, I am quite excited that Gemma 4 E2B thinking is roughly matching Gemma 3 27B across benchmarks (screenshot below)! More info here: deepmind.google/models/gemma/g…

English

2.9K

Skander Moalla retweetledi

Xiuying Wei@XiuyingWei966·13 Mar

Regarding efficient architectures, there have been many works such as state space models, dilated attention, SwinT, etc., that train efficient architectures from scratch. However, we explore another road: training a better-designed dense architecture that enables much more flexible inference-time sparsification.💪 A single RAT+ 🐭 model is pretrained densely once and then flexibly switched at inference time to different dilated patterns (optionally with local windows) or hybrid layer/head compositions. ✨ Note that sparsifying a pretrained attention-only model to a dilated pattern typically leads to severe accuracy degradation. RAT+ avoids this and also performs better under the top-k block pattern. This is simply done by hierarchically augmenting the heavy attention with a very light recurrence⚡. By training both modules well, we can control how much to use the heavy vs. fast module at inference, adapting to different downstream tasks and hardware needs without retraining separate configurations.

English

17.1K

Skander Moalla@SkanderMoalla·19 Şub

@jdeschena 🚀🚀🚀🚀🚀

QME

Justin Deschenaux @ ICLR 🇧🇷@jdeschena·17 Şub

Happy to share that we were lucky to awarded an oral presentation for PGM at ICLR 🇧🇷🚀 We just uploaded the camera-ready version on OpenReview if you want to check it out: openreview.net/forum?id=vEh1c… Recall that you can try our models directly in Colab notebooks (see thread) ⚡️

Justin Deschenaux @ ICLR 🇧🇷@jdeschena

📢 « Partition Generative Modeling (PGM): Masked Modeling without Masks » is out! 🚯 Masked diffusion models waste FLOPs processing countless mask tokens that carry no real information. ⚡We show how partitioning can replace masking, boosting throughput by >5.3x on text and up to 7.5x on VQ-ImageNet! 📄 paper: arxiv.org/abs/2505.18883 💻 Code: github.com/jdeschena/pgm 🤗 Models: huggingface.co/jdeschena/pgm 1/9 🧵

English

5.2K

Skander Moalla retweetledi

Quentin Berthet@qberthet·16 Şub

🚨 🔬 PhD positions at Google DeepMind in France 🇫🇷 We are advertising Master Level Intern positions at Google DeepMind within our Frontier AI Unit. These could lead to co-advised PhD positions with Google DeepMind and French academic institutions. job-boards.greenhouse.io/deepmind/jobs/…

English

591

52.7K

Skander Moalla@SkanderMoalla·3 Şub

@jsuarez I'd still report the number of samples collected, as this is meaningful compute and would expect the same if some selection strategy for training was applied to the samples collected. (Total wall-clock still includes those unused but generated samples as well for example)

English

145

Joseph Suarez 🐡@jsuarez·3 Şub

RL Researchers: How should we report training steps/second in Puffer 4? There is a new feature that lets you use replay ratios below 1, meaning some samples collected are never used for training. Should we just use min(samples collected, samples trained)? This also seems unfair

English

3.8K

Skander Moalla retweetledi

Alex Hägele@haeggee·3 Şub

The main project of my time as @AnthropicAI fellow is finally out: The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity? w/ great collaborators @aryopg @sleight_henry @EthanJPerez and supervised by @jaschasd ! Some personal notes:

Anthropic@AnthropicAI

New Anthropic Fellows research: How does misalignment scale with model intelligence and task complexity? When advanced AI fails, will it do so by pursuing the wrong goals? Or will it fail unpredictably and incoherently—like a "hot mess?" Read more: alignment.anthropic.com/2026/hot-mess-…

English

104

9.7K

Skander Moalla retweetledi

yobibyte@y0b1byte·29 Oca

I am looking for an intern to do a research project on RL posttraining of LLMs. If you are PhD student and would like to work with me for several months pushing the efficiency of RL systems, send me an email with the [efficient_rl_internship] subject. Friends, please, retweet.

English

425

45.3K

Skander Moalla@SkanderMoalla·14 Oca

@jsuarez Nice! It looks like a game-changer for research on plasticity and continual learning, where I had to wait for hours before observing plasticity collapse. This can unlock much faster iteration cycles. How flexible is it to tweak networks, add env transforms, etc., for research?

English

139

Joseph Suarez 🐡@jsuarez·14 Oca

We finally hit the 10m/s training perf target for PufferLib 4.0. Breakout solved in 13 seconds. The forward pass is two custom kernels, matmuls, and zero other operations. RL will be fast!

English

251

12.4K

Skander Moalla@SkanderMoalla·6 Ara

@NandoDF Maybe we should actually pack sequences in SFT without masking the attention to the previous queries 🧐

English

147

Nando de Freitas@NandoDF·6 Ara

Why is it that with ChatGPT, Gemini, Claude, Copilot and other LLMs we have to always start new chats for them to work well? What is the scientific explanation? What are the hypotheses? What is the evidence for each?

English

111

203

38.8K

Skander Moalla retweetledi

Mikhail Terekhov@MiTerekhov·6 Ara

Great to see AI safety expanding at EPFL! Come for a great academic community, mountains, and objectively insanely good living standards

Bob West@cervisiarius

🚨 Postdoc available in my lab @EPFL on safe AI, alignment, LLMs, NLP, mech interp 🎯 Safe AI that truly cares about humans! No “lipstick-on-a-pig alignment” 🐖💄 after pretraining—let’s “raise” models aligned from token 1 onward 🍼 👉 Info & app: go.epfl.ch/safe-ai-postdoc

English

2.1K

Skander Moalla@SkanderMoalla·5 Ara

QRPO is at #NeurIPS2025 in San Diego & #EurIPS in Copenhagen! 🚀 🇺🇸 San Diego: Catch me on Friday, 11 AM – 2 PM, Halls C-E, Poster #512. (I'm on the job market for Fall '26 - RL & LLM post-training! 💼) 🇩🇰 Copenhagen: @matrs01 is still on-site! If you missed his poster on Wednesday, DM him to chat. (He's looking for a PhD position for Fall '26! 🎓) If you’re interested in the challenges of offline/off-policy reinforcement learning for LLMs and how QRPO addresses them, come find us. We’d love to chat about our insights and what’s next! 🙌 Paper: arxiv.org/abs/2507.08068 We also shipped significant infrastructure with the project: - Truly sandboxed and scalable code execution on SLURM + Podman with no privileged permissions: github.com/CLAIRE-Labo/qu… - And as always, check out our ML project template: github.com/CLAIRE-Labo/py…

English

3.8K

Skander Moalla retweetledi

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·29 Kas

Just a heads-up: this year @NeurIPSConf is not using the Whova app. You can find the new mobile app on the NeurIPS website neurips.cc/mobile/support/ Literally the worst communications by the organizers on this one. What was wrong with Whova? #NeurIPS2025 #whova

English

13.6K

Skander Moalla retweetledi

Mikayel Samvelyan@_samvelyan·11 Kas

I’m hiring a Student Researcher at @GoogleDeepMind. This research role centers on topics of open-ended self-improvement and discovery with LLM agents. 📍 Location: London 🗓️ Duration: 6 months, 100% 🚀 Start date: June or July 2026 Apply now using the links below👇

English

582

110.5K

Skander Moalla retweetledi

vLLM@vllm_project·12 Kas

🚀 No More Train–Inference Mismatch! We demonstrate bitwise consistent on-policy RL with TorchTitan (training) + vLLM (inference) — the first open-source run where training and inference numerics match exactly. It only takes 3 steps: 1️⃣ Make vLLM batch-invariant (same seq → same output regardless of batching) 2️⃣ Ensure forward passes in training use identical kernels as inference 3️⃣ Add custom backward passes in PyTorch ✅ Verified on Qwen3 1.7B + GSM8K: • batch_inv_ON (bitwise exact) → KL=0.0, faster convergence, higher reward • batch_inv_OFF → reduced reward, instability We audited every op, imported vLLM’s fused kernels (SiLU MLPs, RMSNorm+residual), and wrote matching backward passes. Run is fully on-policy, deterministic, and reproducible. Next: • Unified model code • torch.compile support • Perf tuning (current bitwise RL ≈2.4× slower) • Broader model + op coverage 🔗 blog.vllm.ai/2025/11/10/bit… #vLLM #TorchTitan #RL #LLM #AIResearch

English

516

91.3K

Skander Moalla retweetledi

Sundar Pichai@sundarpichai·6 Kas

Our 7th gen TPU Ironwood is coming to GA! It’s our most powerful TPU yet: 10X peak performance improvement vs. TPU v5p, and more than 4X better performance per chip for both training + inference workloads vs. TPU v6e (Trillium). We use TPUs to train + serve our own frontier models, including Gemini, and we’re excited to make the latest generation available to @googlecloud customers.

GIF

English

247

651

5.5K

994.8K

Skander Moalla retweetledi

Tim Davidson@im_td·5 Kas

We’ve identified a “Collaboration Gap” in today’s top AI models. Testing 32 leading LMs on our novel maze-solving benchmark, we found that models that excel solo can see their performance *collapse* when required to collaborate – even with an identical copy of themselves. A \🧵

English

15.4K

Skander Moalla retweetledi

Pablo Samuel Castro@pcastr·28 Eki

🚨The Formalism-Implementation Gap in RL research🚨 Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE). 1⃣ Let's advance science of RL 2⃣ Let's be explicit about how benchmarks map to formalism 1/X

English

154

9.4K

Keşfet

@johanferret @jdeschena @jsuarez @AnthropicAI @aryopg @sleight_henry @EthanJPerez @jaschasd