Skander Moalla

155 posts

Skander Moalla banner
Skander Moalla

Skander Moalla

@SkanderMoalla

RS Intern @ Meta FAIR | PhD @ EPFL, Caglar Gulcehre Lab for AI Research (CLAIRE) | Reinforcement learning, Large Language Model post-training

Lausanne, Switzerland Katılım Ocak 2017
425 Takip Edilen283 Takipçiler
Skander Moalla retweetledi
Charles Arnal
Charles Arnal@arnal_charles·
(1/9) Experience replay can cut LLM RL training compute by up to ~40% (without hurting final accuracy—and sometimes improving it). Paper: arxiv.org/abs/2604.08706
Charles Arnal tweet media
English
4
41
195
14.6K
Skander Moalla retweetledi
Giorgia Ramponi
Giorgia Ramponi@gio_ramponi·
I’m looking for one strong RL intern (4 months, starting in September) with the intention of becoming a PhD position. The position is based in Zurich, a fantastic research environment with an active ML/AI community. info here: lnkd.in/dj2J-Z-f
English
12
53
389
29.7K
Skander Moalla retweetledi
Charles Arnal
Charles Arnal@arnal_charles·
TL;DR: As a pilot we formalized an entire math textbook in Lean 4 with an agentic pipeline. Only the beginning—next up: more books and more insights!
Fabian Gloeckle@FabianGloeckle

A new milestone in automatic formalization: We translated an entire graduate math textbook into Lean using 30K LLM agents. Open-source, large-scale multi-agent inference that actually works > Blueprint+Lean: faabian.github.io/algebraic-comb… > Codebase+preprint: github.com/facebookresear… 1/7

English
1
4
11
945
Johan Ferret
Johan Ferret@johanferret·
Gemma 4 is out! 💎💎💎💎 Among many things, I am quite excited that Gemma 4 E2B thinking is roughly matching Gemma 3 27B across benchmarks (screenshot below)! More info here: deepmind.google/models/gemma/g…
Johan Ferret tweet media
English
3
6
25
2.9K
Skander Moalla retweetledi
Xiuying Wei
Xiuying Wei@XiuyingWei966·
Regarding efficient architectures, there have been many works such as state space models, dilated attention, SwinT, etc., that train efficient architectures from scratch. However, we explore another road: training a better-designed dense architecture that enables much more flexible inference-time sparsification.💪 A single RAT+ 🐭 model is pretrained densely once and then flexibly switched at inference time to different dilated patterns (optionally with local windows) or hybrid layer/head compositions. ✨ Note that sparsifying a pretrained attention-only model to a dilated pattern typically leads to severe accuracy degradation. RAT+ avoids this and also performs better under the top-k block pattern. This is simply done by hierarchically augmenting the heavy attention with a very light recurrence⚡. By training both modules well, we can control how much to use the heavy vs. fast module at inference, adapting to different downstream tasks and hardware needs without retraining separate configurations.
Xiuying Wei tweet media
English
3
12
71
17.1K
Skander Moalla retweetledi
Quentin Berthet
Quentin Berthet@qberthet·
🚨 🔬 PhD positions at Google DeepMind in France 🇫🇷 We are advertising Master Level Intern positions at Google DeepMind within our Frontier AI Unit. These could lead to co-advised PhD positions with Google DeepMind and French academic institutions. job-boards.greenhouse.io/deepmind/jobs/…
English
8
60
591
52.7K
Skander Moalla
Skander Moalla@SkanderMoalla·
@jsuarez I'd still report the number of samples collected, as this is meaningful compute and would expect the same if some selection strategy for training was applied to the samples collected. (Total wall-clock still includes those unused but generated samples as well for example)
English
0
0
0
145
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
RL Researchers: How should we report training steps/second in Puffer 4? There is a new feature that lets you use replay ratios below 1, meaning some samples collected are never used for training. Should we just use min(samples collected, samples trained)? This also seems unfair
English
4
0
34
3.8K
Skander Moalla retweetledi
Alex Hägele
Alex Hägele@haeggee·
The main project of my time as @AnthropicAI fellow is finally out: The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity? w/ great collaborators @aryopg @sleight_henry @EthanJPerez and supervised by @jaschasd ! Some personal notes:
Anthropic@AnthropicAI

New Anthropic Fellows research: How does misalignment scale with model intelligence and task complexity? When advanced AI fails, will it do so by pursuing the wrong goals? Or will it fail unpredictably and incoherently—like a "hot mess?" Read more: alignment.anthropic.com/2026/hot-mess-…

English
3
12
104
9.7K
Skander Moalla retweetledi
yobibyte
yobibyte@y0b1byte·
I am looking for an intern to do a research project on RL posttraining of LLMs. If you are PhD student and would like to work with me for several months pushing the efficiency of RL systems, send me an email with the [efficient_rl_internship] subject. Friends, please, retweet.
English
11
50
425
45.3K
Skander Moalla
Skander Moalla@SkanderMoalla·
@jsuarez Nice! It looks like a game-changer for research on plasticity and continual learning, where I had to wait for hours before observing plasticity collapse. This can unlock much faster iteration cycles. How flexible is it to tweak networks, add env transforms, etc., for research?
English
0
0
0
139
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
We finally hit the 10m/s training perf target for PufferLib 4.0. Breakout solved in 13 seconds. The forward pass is two custom kernels, matmuls, and zero other operations. RL will be fast!
Joseph Suarez 🐡 tweet media
English
8
20
251
12.4K
Skander Moalla
Skander Moalla@SkanderMoalla·
@NandoDF Maybe we should actually pack sequences in SFT without masking the attention to the previous queries 🧐
English
0
0
0
147
Nando de Freitas
Nando de Freitas@NandoDF·
Why is it that with ChatGPT, Gemini, Claude, Copilot and other LLMs we have to always start new chats for them to work well? What is the scientific explanation? What are the hypotheses? What is the evidence for each?
English
111
15
203
38.8K
Skander Moalla retweetledi
Mikhail Terekhov
Mikhail Terekhov@MiTerekhov·
Great to see AI safety expanding at EPFL! Come for a great academic community, mountains, and objectively insanely good living standards
Bob West@cervisiarius

🚨 Postdoc available in my lab @EPFL on safe AI, alignment, LLMs, NLP, mech interp 🎯 Safe AI that truly cares about humans! No “lipstick-on-a-pig alignment” 🐖💄 after pretraining—let’s “raise” models aligned from token 1 onward 🍼 👉 Info & app: go.epfl.ch/safe-ai-postdoc

English
1
1
20
2.1K
Skander Moalla
Skander Moalla@SkanderMoalla·
QRPO is at #NeurIPS2025 in San Diego & #EurIPS in Copenhagen! 🚀 🇺🇸 San Diego: Catch me on Friday, 11 AM – 2 PM, Halls C-E, Poster #512. (I'm on the job market for Fall '26 - RL & LLM post-training! 💼) 🇩🇰 Copenhagen: @matrs01 is still on-site! If you missed his poster on Wednesday, DM him to chat. (He's looking for a PhD position for Fall '26! 🎓) If you’re interested in the challenges of offline/off-policy reinforcement learning for LLMs and how QRPO addresses them, come find us. We’d love to chat about our insights and what’s next! 🙌 Paper: arxiv.org/abs/2507.08068 We also shipped significant infrastructure with the project: - Truly sandboxed and scalable code execution on SLURM + Podman with no privileged permissions: github.com/CLAIRE-Labo/qu… - And as always, check out our ML project template: github.com/CLAIRE-Labo/py…
Skander Moalla tweet media
English
0
5
19
3.8K
Skander Moalla retweetledi
Mikayel Samvelyan
Mikayel Samvelyan@_samvelyan·
I’m hiring a Student Researcher at @GoogleDeepMind. This research role centers on topics of open-ended self-improvement and discovery with LLM agents. 📍 Location: London 🗓️ Duration: 6 months, 100% 🚀 Start date: June or July 2026 Apply now using the links below👇
Mikayel Samvelyan tweet media
English
14
64
582
110.5K
Skander Moalla retweetledi
vLLM
vLLM@vllm_project·
🚀 No More Train–Inference Mismatch! We demonstrate bitwise consistent on-policy RL with TorchTitan (training) + vLLM (inference) — the first open-source run where training and inference numerics match exactly. It only takes 3 steps: 1️⃣ Make vLLM batch-invariant (same seq → same output regardless of batching) 2️⃣ Ensure forward passes in training use identical kernels as inference 3️⃣ Add custom backward passes in PyTorch ✅ Verified on Qwen3 1.7B + GSM8K: • batch_inv_ON (bitwise exact) → KL=0.0, faster convergence, higher reward • batch_inv_OFF → reduced reward, instability We audited every op, imported vLLM’s fused kernels (SiLU MLPs, RMSNorm+residual), and wrote matching backward passes. Run is fully on-policy, deterministic, and reproducible. Next: • Unified model code • torch.compile support • Perf tuning (current bitwise RL ≈2.4× slower) • Broader model + op coverage 🔗 blog.vllm.ai/2025/11/10/bit… #vLLM #TorchTitan #RL #LLM #AIResearch
English
9
63
516
91.3K
Skander Moalla retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Our 7th gen TPU Ironwood is coming to GA!  It’s our most powerful TPU yet: 10X peak performance improvement vs. TPU v5p, and more than 4X better performance per chip for both training + inference workloads vs. TPU v6e (Trillium). We use TPUs to train + serve our own frontier models, including Gemini, and we’re excited to make the latest generation available to @googlecloud customers.
GIF
English
247
651
5.5K
994.8K
Skander Moalla retweetledi
Tim Davidson
Tim Davidson@im_td·
We’ve identified a “Collaboration Gap” in today’s top AI models. Testing 32 leading LMs on our novel maze-solving benchmark, we found that models that excel solo can see their performance *collapse* when required to collaborate – even with an identical copy of themselves. A \🧵
Tim Davidson tweet media
English
4
26
70
15.4K
Skander Moalla retweetledi
Pablo Samuel Castro
Pablo Samuel Castro@pcastr·
🚨The Formalism-Implementation Gap in RL research🚨 Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE). 1⃣ Let's advance science of RL 2⃣ Let's be explicit about how benchmarks map to formalism 1/X
Pablo Samuel Castro tweet media
English
2
28
154
9.4K