Willie Neiswanger

184 posts

Willie Neiswanger

@willieneis

Assistant Professor @USC in CS + AI. Previously @Stanford, @SCSatCMU. Machine Learning, Decision Making, AI-for-Science, Generative Models.

Los Angeles Katılım Mart 2009

280 Takip Edilen1.5K Takipçiler

Willie Neiswanger@willieneis·18 Şub

@haozhangml Congrats Hao!!

English

120

Hao Zhang@haozhangml·18 Şub

Can’t believe I get to say this -- deeply honored to be named a 2026 Sloan Research Fellow: today.ucsd.edu/story/2026-slo… Early faculty life is… "hyper-intense": teaching, advising, hiring, papers, grants; and trying to build a lab culture you’ll still be proud of years later. There were many weeks where it felt like we were building the plane mid-flight, burning plenty of midnight oil along the way. Over the past few years, I’ve been incredibly lucky to work with amazing students and collaborators on a chain of OSS project: Vicuna → Chatbot Arena → vLLM → DistServe → LMGame → FastVideo; each one then pushed forward way further by people far beyond our lab. This award feels less like a finish line and more like fuel for the lab, for our students, and for the next set of systems we haven’t built yet. A core principle of us is building "open-source research that ships." At the same time, it’s hard not to feel a mix of excitement + uncertainty + anxiety about where CS is heading. Coding agents are improving so fast that I am feeling the AGI first-handedly. I have gone back to builder mode -- only more productive than ever -- outside of my faculty admin work. I’ve watched friends and colleagues hit numbers that would’ve sounded like science fiction a year ago (e.g., 100+ commits/day). So what does it mean to “do great computer science” when baseline productivity keeps jumping? For me, it makes “research that ships” more important, and even raises the bar. The leverage shifts toward taste and problem selection, principled system design, and translating ideas into reliable artifacts. We're excited to keep proving that through real systems people can use! Deeply grateful to: - My students and collaborators — for the ideas, execution, and drive. - @HDSIUCSD , Dean @GuptaUcsd, and my @UCSanDiego colleagues — for building an environment where ambitious work can happen. - @nvidia and @mbzuai (and other compute sponsors) — for support that helped us move faster and turn ideas into real artifacts. Even as the interface changes, the need for efficient compute and solid infrastructure only grows. Most of all: credit to the students at @haoailab. You’re the reason any of this is worth doing. Keep building and shipping!

English

185

16.2K

Willie Neiswanger retweetledi

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·10 Eki

Tina proved that LoRA can match or surpass full-parameter RL. Tora builds directly on that result, turning it into a full framework. Built on torchtune, it extends RL post-training to LoRA, QLoRA, DoRA, and QDoRA under one interface with GRPO, FSDP, and compile support. QLoRA and QDoRA enable 4-bit RL with stable rewards, while DoRA-Cache speeds rollouts by 2–4× under the same setup. Tora establishes a clean, scalable baseline for LoRA in RL post-training. ⮕ 𝐥𝐢𝐧𝐤 𝐛𝐞𝐥𝐨𝐰

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

Tina: Tiny Reasoning Models via LoRA LoRA-RL tuned 1.5B models on curated reasoning data, achieving +20% gains and 43% Pass @1 (AIME24) at $9 total cost. Outperforms full-parameter RL on DeepSeek-R1-Distill-Qwen-1.5B. - LoRA-based RL yields better performance with less compute. - best checkpoints align with format-reward transitions, not accuracy plateaus. - efficiently adapts reasoning structure while preserving core model knowledge.

English

300

29.3K

Willie Neiswanger retweetledi

Johnny Tian-Zheng Wei@johntzwei·24 Eki

Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵

English

130

47.8K

Willie Neiswanger@willieneis·9 Eki

Links — Repo: github.com/shangshang-wan… Tina Paper: arxiv.org/abs/2504.15777

English

490

Willie Neiswanger@willieneis·9 Eki

It was great to see @thinkymachines LoRA w/o Regret blog, which connects nicely to our work on Tina (LoRA for RL). For wider use, we’re releasing a clean implementation of RL with LoRA, DoRA, QLoRA/QDoRA, plus speedups & more, across models from 1.5B–32B. Nice work @UpupWang!

Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English

3.4K

Willie Neiswanger@willieneis·26 Tem

@shengjia_zhao Awesome, congrats Shengjia!!

English

418

Shengjia Zhao@shengjia_zhao·25 Tem

I am very excited to take up the role of chief scientist for meta super-intelligence labs. Looking forward to building asi and aligning it to empower people with the amazing team here. Let’s build!

English

437

310

8.8K

780.7K

Willie Neiswanger retweetledi

Shangshang Wang@UpupWang·12 Haz

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

English

501

72.2K

Willie Neiswanger retweetledi

Deqing Fu@DeqingFu·21 May

Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!

English

7.6K

Willie Neiswanger retweetledi

Sebastian Raschka@rasbt·7 May

Is LoRA (Low Rank Adaptation) relevant in 2025 for reasoning models? I recently read "Tina: Tiny Reasoning Models via LoRA (arxiv.org/abs/2504.15777)", and it made me pause for a moment: when was the last time I heard someone excitedly talk/write about LoRA? LoRA (Low-Rank Adaptation) was one of the most influential fine-tuning methods in the earlier LLM boom (as you may remember, I wrote about it a lot in recent years). The idea is simple but effective: avoid full model updates and instead inject a small number of trainable parameters for downstream tasks. This drastically reduces memory and compute costs. But in the age of ever-larger instruction-tuned models coupled with well-working distillation techniques (like popularized by DeepSeek-R1 etc), LoRA seemed to become more irrelevant recently. What about LoRA work for developing reasoning models? This paper tackles exactly that question. Instead of the usual supervised fine-tuning or instruction distillation pipeline, the authors use LoRA with reinforcement learning (RL) to improve reasoning capabilities. Specifically, they fine-tune a 1.5B base model using LoRA adapters while applying RL on reasoning benchmarks. Their baseline model is DeepSeek-R1-Distill-Qwen-1.5B, which is a model already fine-tuned for reasoning tasks. (I wish they started with the base Qwen-1.5B model; but this way, I guess they have more comparisons with other methods that further trained the DeepSeek-R1-Distill-Qwen-1.5B.) From there, the authors ran experiments across datasets, learning rates, LoRA ranks, and RL algorithms. Their best-performing model was trained on just 7k examples and cost just $9 to train. Even with hyperparameter sweeps and multiple ablations, the entire study cost just $526. So, how well does LoRA work? The top half of the results figure (highlighted in blue) compares models trained with LoRA-based RL versus standard RL (i.e., no LoRA). On every benchmark (AIME24, AIME25, AMC23, MATH500, GPAQ, Minerva), LoRA outperforms the regular RL baseline when applied to the same starting model. Insights from ablations 1) Surprisingly, the best-performing model came from the smallest dataset: just 7k examples from Open-RS. 2) The classic LoRA rank 16 emerged as the sweet spot, but ranks 8 and 32 also worked well. 3) It's nice that they included the recent Dr. GRPO (I recently discussed it in my latest Ahead of AI blog). It substantially reduces training time by length-normalizing rewards and addressing issues in GRPO Bottom line: Reasoning is certainly an interesting use case, and it's interesting (and a bit surprising) that LoRA does so well here. It might also be the first case where I've seen LoRA coupled with RL, which is another interesting aspect. LoRA certainly peaked in popularity 1-2 years ago, and more people now consider (more expensive) full-parameter updates (based on anecdotal perception); there's still a place for LoRA and LoRA-like methods. Let's not forget that one of the key advantages of LoRA is that it doesn't modify the underlying base model. This is key in applications where you either have lots of specialized use cases or lots of customers. For example, instead of storing 100 1B full-parameter tuned models, it would be much cheaper to store a 32B model with 100 sets of LoRA weights.

English

171

980

63K

Willie Neiswanger retweetledi

Ollie Liu@olliezliu·24 Nis

Presenting our spotlight paper on LLMs for decision making at @iclr_conf, Apr 25, 10–12:30PM, Hall 3 #113. Come say hi!

Ollie Liu@olliezliu

New Preprint Alert! 📢 Classical decision theory has helped humans make rational decisions under uncertainty for decades. Can it do the same for Large Language Models? We present DeLLMa (“dilemma”), a Decision-making LLM assistant. 🔗 DeLLMa.github.io 1/🧵

English

4.2K

Willie Neiswanger retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·23 Nis

Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA."

Tanishq Mathew Abraham, Ph.D. tweet media

English

137

758

61.5K

Willie Neiswanger retweetledi

Shangshang Wang@UpupWang·23 Nis

😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵

English

399

43.7K

Willie Neiswanger@willieneis·12 Mar

@krandiash Congrats!

English

143

Karan Goel@krandiash·11 Mar

Announcing our Series A and new model updates. We're hiring!

Cartesia@cartesia

We've raised a $64M Series A led by @kleinerperkins to build the platform for real-time voice AI. We'll use this funding to expand our team, and to build the next generation of models, infrastructure, and products for voice, starting with Sonic 2.0, available today. Link below to try it free 👇

English

661

86.8K

Willie Neiswanger@willieneis·27 Şub

@volokuleshov Congrats!

English

428

Volodymyr Kuleshov 🇺🇦@volokuleshov·26 Şub

Excited to announce the first commercial-scale diffusion language model---Mercury Coder. Mercury runs at 1000 tokens/sec on Nvidia hardware while matching the performance of existing speed-optimized LLMs. Mercury introduces a new approach to language generation inspired by image and video generation systems like MidJourney and Sora. This approach is significantly more efficient (faster and cheaper) to run that existing LLMs, and reduces the cost of AI inference by 10x. Mercury Coder also achieves comparable performance to speed-optimized frontier models like Claude Haiku and GPT4o-mini. However, it is much more hardware-efficient because it uses a parallel generation mechanism that takes advantage of GPUs. This makes the model much faster or cheaper to run (more users can be served on the same hardware). You can try it today in our playground!

Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English

405

49.2K

Willie Neiswanger@willieneis·27 Şub

@StefanoErmon Congrats!

English

424

Stefano Ermon@StefanoErmon·27 Şub

Excited to share that I’ve been working on scaling up diffusion language models at Inception. A new generation of LLMs with unprecedented capabilities is coming!

Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English

684

50.1K

Willie Neiswanger@willieneis·27 Şub

@adityagrover_ Congrats!

English

282

Aditya Grover@adityagrover_·27 Şub

A few months ago, we started Inception Labs, a new generative AI startup with a rockstar founding team. At Inception, we are challenging the status quo for language generation. Our first results bring blazing fast speeds at 1000+ tokens/sec while matching the quality of leading speed-optimized frontier LLMs. And all on commodity NVIDIA H100s - an industry first! Our vision is to extend the frontier of speed, quality, and cost for next-generation language models. Join us!

Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English

621

60.9K

Willie Neiswanger@willieneis·19 Şub

An awesome set of resources on LLM reasoning and test-time compute, compiled by @UpUpWang — check it out!

Shangshang Wang@UpupWang

🔍 Diving deep into LLM reasoning? From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute — we break it down into structured spreadsheets. 🧵

English

1.4K

Willie Neiswanger retweetledi

Jiarui Zhang (Jerry)@JiaruiZ58876329·10 Oca

[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image. How can we do better? Introducing Euclid, our first study at improving MLLM’s LLVP. We show that with proper architecture & training choices, even small MLLMs can learn strong and generalizable LLVP, surpassing the best proprietary models!

English

3.4K

Willie Neiswanger@willieneis·8 Oca

@ben_lengerich @USC @PrimeIntellect Thanks Ben!

English

109

Ben Lengerich@ben_lengerich·7 Oca

@willieneis @USC @PrimeIntellect Congrats Willie! Looks very cool.

English

140

Willie Neiswanger@willieneis·6 Oca

Excited to release METAGENE-1, a 7B parameter metagenomic foundation model, built to aid in pathogen detection & pandemic monitoring. Pretrained on 1.5 trillion base pairs of DNA/RNA sequenced from wastewater. A collab w/ @USC, @PrimeIntellect, & the Nucleic Acid Observatory. 🧵

English

117

13.3K

Willie Neiswanger@willieneis·8 Oca

@anthonygitter @PrimeIntellect @tatta_bio Thanks for the links! Yes, these are great ideas for next steps.

English

Anthony Gitter@anthonygitter·7 Oca

@willieneis @PrimeIntellect Exciting work! For next steps, are you thinking about how this would compare to classic bioinformatics workflows (e.g. doi.org/10.1038/s41467…) for pathogen detection? It could also be relevant to compare to the @tatta_bio gLM model and DGEB benchmark.

English

213

Keşfet

@haozhangml @HDSIUCSD @GuptaUcsd @UCSanDiego @nvidia @mbzuai @haoailab @thinkymachines