
Aashu Singh
341 posts

Aashu Singh
@iam_aashusingh
ML Engg @Facebook Alum @GeorgiaTech







🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…



We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!











Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring202… Lectures will be uploaded to Youtube: youtube.com/playlist?list=…


Two weeks ago, we open-sourced Agent S2 — and the response has been amazing. 🙌 Today, we’re excited to share the technical paper that dives into our agent design and key innovations. Agent S2 blends generalist reasoning with specialist grounding for precise, long-horizon computer use tasks: ⚙️ Mixture-of-Grounding 🧠 Proactive Hierarchical Planning 📈 SOTA on OSWorld, AndroidWorld (✨new), and WindowsAgentArena (✨new) 👉 Tech blog: simular.ai/articles/agent… 👉 Paper: arxiv.org/abs/2410.08164





why did R1's RL suddenly start working, when previous attempts to do similar things failed? theory: we've basically spent the last few years running a massive acausally distributed chain of thought data annotation program on the pretraining dataset. deepseek's approach with R1 is a pretty obvious method. they are far from the first lab to try "slap a verifier on it and roll out CoTs." but it didn't used to work that well. all of a sudden, though, it did start working. and reproductions of R1, even using slightly different methods, are just working too--it's not some super-finicky method that deepseek lucked out finding. all of a sudden, the basic, obvious techniques are... just working, much better than they used to. in the last couple of years, chains of thought have been posted all over the internet (LLM outputs leaking into pretraining like this is usually called "pretraining contamination"). and not just CoTs--outputs posted on the internet are usually accompanied by linguistic markers of whether they're correct or not ("holy shit it's right", "LOL wrong"). this isn't just true for easily verifiable problems like math, but also fuzzy ones like writing. those CoTs in the V3 training set gave GRPO enough of a starting point to start converging, and furthermore, to generalize from verifiable domains to the non-verifiable ones using the bridge established by the pretraining data contamination. and now, R1's visible chains of thought are going to lead to *another* massive enrichment of human-labeled reasoning on the internet, but on a far larger scale... the next round of base models post-R1 will be *even better* bases for reasoning models.








