Willie Neiswanger

184 posts

Willie Neiswanger banner
Willie Neiswanger

Willie Neiswanger

@willieneis

Assistant Professor @USC in CS + AI. Previously @Stanford, @SCSatCMU. Machine Learning, Decision Making, AI-for-Science, Generative Models.

Los Angeles Katılım Mart 2009
280 Takip Edilen1.5K Takipçiler
Hao Zhang
Hao Zhang@haozhangml·
Can’t believe I get to say this -- deeply honored to be named a 2026 Sloan Research Fellow: today.ucsd.edu/story/2026-slo… Early faculty life is… "hyper-intense": teaching, advising, hiring, papers, grants; and trying to build a lab culture you’ll still be proud of years later. There were many weeks where it felt like we were building the plane mid-flight, burning plenty of midnight oil along the way. Over the past few years, I’ve been incredibly lucky to work with amazing students and collaborators on a chain of OSS project: Vicuna → Chatbot Arena → vLLM → DistServe → LMGame → FastVideo; each one then pushed forward way further by people far beyond our lab. This award feels less like a finish line and more like fuel for the lab, for our students, and for the next set of systems we haven’t built yet. A core principle of us is building "open-source research that ships." At the same time, it’s hard not to feel a mix of excitement + uncertainty + anxiety about where CS is heading. Coding agents are improving so fast that I am feeling the AGI first-handedly. I have gone back to builder mode -- only more productive than ever -- outside of my faculty admin work. I’ve watched friends and colleagues hit numbers that would’ve sounded like science fiction a year ago (e.g., 100+ commits/day). So what does it mean to “do great computer science” when baseline productivity keeps jumping? For me, it makes “research that ships” more important, and even raises the bar. The leverage shifts toward taste and problem selection, principled system design, and translating ideas into reliable artifacts. We're excited to keep proving that through real systems people can use! Deeply grateful to: - My students and collaborators — for the ideas, execution, and drive. - @HDSIUCSD , Dean @GuptaUcsd, and my @UCSanDiego colleagues — for building an environment where ambitious work can happen. - @nvidia and @mbzuai (and other compute sponsors) — for support that helped us move faster and turn ideas into real artifacts. Even as the interface changes, the need for efficient compute and solid infrastructure only grows. Most of all: credit to the students at @haoailab. You’re the reason any of this is worth doing. Keep building and shipping!
English
34
10
185
16.2K
Willie Neiswanger retweetledi
𝚐𝔪𝟾𝚡𝚡𝟾
Tina proved that LoRA can match or surpass full-parameter RL. Tora builds directly on that result, turning it into a full framework. Built on torchtune, it extends RL post-training to LoRA, QLoRA, DoRA, and QDoRA under one interface with GRPO, FSDP, and compile support. QLoRA and QDoRA enable 4-bit RL with stable rewards, while DoRA-Cache speeds rollouts by 2–4× under the same setup. Tora establishes a clean, scalable baseline for LoRA in RL post-training. ⮕ 𝐥𝐢𝐧𝐤 𝐛𝐞𝐥𝐨𝐰
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

Tina: Tiny Reasoning Models via LoRA LoRA-RL tuned 1.5B models on curated reasoning data, achieving +20% gains and 43% Pass@1 (AIME24) at $9 total cost. Outperforms full-parameter RL on DeepSeek-R1-Distill-Qwen-1.5B. - LoRA-based RL yields better performance with less compute. - best checkpoints align with format-reward transitions, not accuracy plateaus. - efficiently adapts reasoning structure while preserving core model knowledge.

English
3
29
300
29.3K
Willie Neiswanger retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Johnny Tian-Zheng Wei tweet media
English
2
40
130
47.8K
Willie Neiswanger
Willie Neiswanger@willieneis·
It was great to see @thinkymachines LoRA w/o Regret blog, which connects nicely to our work on Tina (LoRA for RL). For wider use, we’re releasing a clean implementation of RL with LoRA, DoRA, QLoRA/QDoRA, plus speedups & more, across models from 1.5B–32B. Nice work @UpupWang!
Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English
2
1
23
3.4K
Shengjia Zhao
Shengjia Zhao@shengjia_zhao·
I am very excited to take up the role of chief scientist for meta super-intelligence labs. Looking forward to building asi and aligning it to empower people with the amazing team here. Let’s build!
English
437
310
8.8K
780.7K
Willie Neiswanger retweetledi
Shangshang Wang
Shangshang Wang@UpupWang·
Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.
Shangshang Wang tweet media
English
10
55
501
72.2K
Willie Neiswanger retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!
Deqing Fu tweet media
English
1
13
48
7.6K
Willie Neiswanger retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Is LoRA (Low Rank Adaptation) relevant in 2025 for reasoning models? I recently read "Tina: Tiny Reasoning Models via LoRA (arxiv.org/abs/2504.15777)", and it made me pause for a moment: when was the last time I heard someone excitedly talk/write about LoRA? LoRA (Low-Rank Adaptation) was one of the most influential fine-tuning methods in the earlier LLM boom (as you may remember, I wrote about it a lot in recent years). The idea is simple but effective: avoid full model updates and instead inject a small number of trainable parameters for downstream tasks. This drastically reduces memory and compute costs. But in the age of ever-larger instruction-tuned models coupled with well-working distillation techniques (like popularized by DeepSeek-R1 etc), LoRA seemed to become more irrelevant recently. What about LoRA work for developing reasoning models? This paper tackles exactly that question. Instead of the usual supervised fine-tuning or instruction distillation pipeline, the authors use LoRA with reinforcement learning (RL) to improve reasoning capabilities. Specifically, they fine-tune a 1.5B base model using LoRA adapters while applying RL on reasoning benchmarks. Their baseline model is DeepSeek-R1-Distill-Qwen-1.5B, which is a model already fine-tuned for reasoning tasks. (I wish they started with the base Qwen-1.5B model; but this way, I guess they have more comparisons with other methods that further trained the DeepSeek-R1-Distill-Qwen-1.5B.) From there, the authors ran experiments across datasets, learning rates, LoRA ranks, and RL algorithms. Their best-performing model was trained on just 7k examples and cost just $9 to train. Even with hyperparameter sweeps and multiple ablations, the entire study cost just $526. So, how well does LoRA work? The top half of the results figure (highlighted in blue) compares models trained with LoRA-based RL versus standard RL (i.e., no LoRA). On every benchmark (AIME24, AIME25, AMC23, MATH500, GPAQ, Minerva), LoRA outperforms the regular RL baseline when applied to the same starting model. Insights from ablations 1) Surprisingly, the best-performing model came from the smallest dataset: just 7k examples from Open-RS. 2) The classic LoRA rank 16 emerged as the sweet spot, but ranks 8 and 32 also worked well. 3) It's nice that they included the recent Dr. GRPO (I recently discussed it in my latest Ahead of AI blog). It substantially reduces training time by length-normalizing rewards and addressing issues in GRPO Bottom line: Reasoning is certainly an interesting use case, and it's interesting (and a bit surprising) that LoRA does so well here. It might also be the first case where I've seen LoRA coupled with RL, which is another interesting aspect. LoRA certainly peaked in popularity 1-2 years ago, and more people now consider (more expensive) full-parameter updates (based on anecdotal perception); there's still a place for LoRA and LoRA-like methods. Let's not forget that one of the key advantages of LoRA is that it doesn't modify the underlying base model. This is key in applications where you either have lots of specialized use cases or lots of customers. For example, instead of storing 100 1B full-parameter tuned models, it would be much cheaper to store a 32B model with 100 sets of LoRA weights.
Sebastian Raschka tweet media
English
25
171
980
63K
Willie Neiswanger retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA."
Tanishq Mathew Abraham, Ph.D. tweet media
English
13
137
758
61.5K
Willie Neiswanger retweetledi
Shangshang Wang
Shangshang Wang@UpupWang·
😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵
Shangshang Wang tweet media
English
2
66
399
43.7K
Karan Goel
Karan Goel@krandiash·
Announcing our Series A and new model updates. We're hiring!
Cartesia@cartesia

We've raised a $64M Series A led by @kleinerperkins to build the platform for real-time voice AI. We'll use this funding to expand our team, and to build the next generation of models, infrastructure, and products for voice, starting with Sonic 2.0, available today. Link below to try it free 👇

English
32
28
661
86.8K
Volodymyr Kuleshov 🇺🇦
Volodymyr Kuleshov 🇺🇦@volokuleshov·
Excited to announce the first commercial-scale diffusion language model---Mercury Coder. Mercury runs at 1000 tokens/sec on Nvidia hardware while matching the performance of existing speed-optimized LLMs. Mercury introduces a new approach to language generation inspired by image and video generation systems like MidJourney and Sora. This approach is significantly more efficient (faster and cheaper) to run that existing LLMs, and reduces the cost of AI inference by 10x. Mercury Coder also achieves comparable performance to speed-optimized frontier models like Claude Haiku and GPT4o-mini. However, it is much more hardware-efficient because it uses a parallel generation mechanism that takes advantage of GPUs. This makes the model much faster or cheaper to run (more users can be served on the same hardware). You can try it today in our playground!
Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English
32
32
405
49.2K
Aditya Grover
Aditya Grover@adityagrover_·
A few months ago, we started Inception Labs, a new generative AI startup with a rockstar founding team. At Inception, we are challenging the status quo for language generation. Our first results bring blazing fast speeds at 1000+ tokens/sec while matching the quality of leading speed-optimized frontier LLMs. And all on commodity NVIDIA H100s - an industry first! Our vision is to extend the frontier of speed, quality, and cost for next-generation language models. Join us!
Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English
69
41
621
60.9K
Willie Neiswanger retweetledi
Jiarui Zhang (Jerry)
Jiarui Zhang (Jerry)@JiaruiZ58876329·
[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image. How can we do better? Introducing Euclid, our first study at improving MLLM’s LLVP. We show that with proper architecture & training choices, even small MLLMs can learn strong and generalizable LLVP, surpassing the best proprietary models!
Jiarui Zhang (Jerry) tweet media
English
1
5
20
3.4K
Willie Neiswanger
Willie Neiswanger@willieneis·
Excited to release METAGENE-1, a 7B parameter metagenomic foundation model, built to aid in pathogen detection & pandemic monitoring. Pretrained on 1.5 trillion base pairs of DNA/RNA sequenced from wastewater. A collab w/ @USC, @PrimeIntellect, & the Nucleic Acid Observatory. 🧵
English
4
23
117
13.3K