Willie Neiswanger
184 posts

Willie Neiswanger
@willieneis
Assistant Professor @USC in CS + AI. Previously @Stanford, @SCSatCMU. Machine Learning, Decision Making, AI-for-Science, Generative Models.



Tina: Tiny Reasoning Models via LoRA LoRA-RL tuned 1.5B models on curated reasoning data, achieving +20% gains and 43% Pass@1 (AIME24) at $9 total cost. Outperforms full-parameter RL on DeepSeek-R1-Distill-Qwen-1.5B. - LoRA-based RL yields better performance with less compute. - best checkpoints align with format-reward transitions, not accuracy plateaus. - efficiently adapts reasoning structure while preserving core model knowledge.


We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…



New Preprint Alert! 📢 Classical decision theory has helped humans make rational decisions under uncertainty for decades. Can it do the same for Large Language Models? We present DeLLMa (“dilemma”), a Decision-making LLM assistant. 🔗 DeLLMa.github.io 1/🧵



We've raised a $64M Series A led by @kleinerperkins to build the platform for real-time voice AI. We'll use this funding to expand our team, and to build the next generation of models, infrastructure, and products for voice, starting with Sonic 2.0, available today. Link below to try it free 👇

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

🔍 Diving deep into LLM reasoning? From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute — we break it down into structured spreadsheets. 🧵













