Parameswaran Raman
111 posts

Parameswaran Raman
@paramsraman
Research Scientist @ Meta (Superintelligence Labs) | LLM Training Efficiency and Optimizer Design | Large Batch Scaling | Distributed AI Systems

1/10 Are DiLoCo and Schedule-Free actually related? A brief history and unusually late advertisement for our work: Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs (see arxiv.org/abs/2512.17131).





I've been wanting to get to the bottom of this story for so long. #DreamMachine #LumaAI 😂

Math problems with GPT-4o and @khanacademy














Congratulations to Jeff Dean, Greg Corrado, & co-authors of the paper “Distributed Representations of Words and Phrases and their Compositionality”, for winning the #NeurIPS2023 Test of Time Award! This prize recognizes a highly impactful paper published at NeurIPS 10 years ago.





Diffusion models are another type of generative models, besides GAN, VAE, and flow models. The idea is quite smart and clean. It is flexible enough to model any complex distribution while remains tractable to evaluate the distribution. lilianweng.github.io/lil-log/2021/0…




