

Syeda Nahida Akter
241 posts

@__SyedaAkter
PhD student at @LTIatCMU @SCSatCMU and research intern @NVIDIA. Working on improving Reasoning of Generative Models! (@reasyaay.bsky.social)




RL boosts LLM reasoning—but why stop at math & code? 🤔 Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more. 🔥Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers! 🧵↓





Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!





I am happy to announce that RLP has been accepted to ICLR 2026 ! 🎉 RLP re-imagines the foundations of LLM training by bringing reinforcement learning directly into the pretraining stage. This was a true team effort, and it would not have been possible without the invaluable contributions of our amazing team: @shrimai_ @__SyedaAkter @jankautz @MostofaPatwary @MohammadShoeybi @ctnzr @YejinChoinka Paper: arxiv.org/abs/2510.01265 Code: github.com/NVlabs/RLP






New @nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover. Doing this early gives a 19% average boost on tough tasks after all post-training. Pretraining is the long first stage where the model learns to predict the next word from lots of text. Supervised fine-tuning is a later stage where it studies step by step answers from labeled examples. Reinforcement learning then rewards better answers so the model improves further. Diversity matters most in pretraining, while high quality matters most in supervised fine-tuning, roughly 11% vs 15% gains. Even doubling supervised fine-tuning on a base that skipped early reasoning could not catch up. Adding lots of mixed-quality supervised fine-tuning data even cut math by about 5%. High quality reasoning added in pretraining looked small at first, then showed up strongly after supervised fine-tuning. Teams should load diverse reasoning into pretraining, use a small high quality set for supervised fine-tuning, then stabilize with rewards. ---- Paper – arxiv. org/abs/2510.03264 Paper Title: "Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data"

New @nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover. Doing this early gives a 19% average boost on tough tasks after all post-training. Pretraining is the long first stage where the model learns to predict the next word from lots of text. Supervised fine-tuning is a later stage where it studies step by step answers from labeled examples. Reinforcement learning then rewards better answers so the model improves further. Diversity matters most in pretraining, while high quality matters most in supervised fine-tuning, roughly 11% vs 15% gains. Even doubling supervised fine-tuning on a base that skipped early reasoning could not catch up. Adding lots of mixed-quality supervised fine-tuning data even cut math by about 5%. High quality reasoning added in pretraining looked small at first, then showed up strongly after supervised fine-tuning. Teams should load diverse reasoning into pretraining, use a small high quality set for supervised fine-tuning, then stabilize with rewards. ---- Paper – arxiv. org/abs/2510.03264 Paper Title: "Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data"





