Shangshang Wang

66

399

43.7K

Shangshang Wang@UpupWang·3 Ara

Many thanks to @thinkymachines for the Tinker grant! Our plan: we know that LoRA matches full-param RL by exploiting a tiny subspace—now we’re testing whether exploration can push it further. Stay tuned.

English

9

814

Shangshang Wang@UpupWang·9 Eki

@jeremyphoward @cHHillee (Q)DoRA shows significant potential for RL and reasoning by achieving performance comparable to full-parameter training. We have confirmed that in our post: x.com/UpupWang/statu…

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English

1

55

Jeremy Howard@jeremyphoward·2 Eki

@cHHillee So excited for this Horace! Any chance you might be able to support DoRA? answer.ai/posts/2024-04-…

English

0

41

4.6K

Horace He@cHHillee·1 Eki

One interesting "fundamental" reason for Tinker today is the rise of MoE. Whereas hackers used to deploy llama3-70B efficiently on one node, modern deployments of MoE models require large multinode deployments for efficiency. The underlying reason? Arithmetic intensity. (1/5)

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

English

16

66

855

105.8K

Shangshang Wang@UpupWang·9 Eki

@ben_burtenshaw Nice reproduce! We have also done a reproduce and surprisingly found out that (Q)LoRA and (Q)DoRA can also closely match the performance of full-parameter RL post-training. More details as shown in our post: x.com/UpupWang/statu…

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English

Ben Burtenshaw@ben_burtenshaw

2

74

Ben Burtenshaw@ben_burtenshaw·3 Eki

Here it is! We reproduced lora without regrets in TRL, so you can get maximum performance out of LoRA with a familiar lib.

one last time. say it clearly: LoRA GRPO is rank at 1, all layers, same performance and 40% of the VRAM.

English

10

26

241

34.8K

Shangshang Wang@UpupWang·9 Eki

@zzlccc Nice work! We found that (Q)LoRA and (Q)DoRA can also match the performance of full-parameter RL post-training. More details are shown in our post here: x.com/UpupWang/statu…

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English

0

3

405

Zichen Liu@zzlccc·2 Eki

much more convinced after getting my own results: LoRA with rank=1 learns (and generalizes) as well as full-tuning while saving 43% vRAM usage! allows me to RL bigger models with limited resources😆 script: github.com/sail-sg/oat/bl…

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

8

88

794

203.9K

Shangshang Wang@UpupWang·9 Eki

@Tim_Dettmers Totally agreed with the findings about QLoRA. We have also confirmed the matching performance of (Q)LoRA and (Q)DoRA compared with full-parameter RL post-training for reasoning. More details are shown in our post here: x.com/UpupWang/statu…

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English

3

114

Tim Dettmers@Tim_Dettmers·29 Eyl

These findings are very similar to what we found in the full experimental suite when developing QLoRA (>1500 experiments). Long story short: LoRA/QLoRA works well, is cheap on low-memory devices, and allows multiple cheap deployments.

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

10

39

523

75.7K

Shangshang Wang@UpupWang·8 Eki

@johnschulman2 Hey John, exciting update! We tested LoRA-based RL training and confirmed it works seamlessly with variants like DoRA, QLoRA, and QDoRA. Check out our post here: x.com/UpupWang/statu…

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English

4

165

John Schulman@johnschulman2·6 Eki

Even if I've tested a result extensively, it's hard to know how well it'll generalize to different experimental setups and software stacks

English

much more convinced after getting my own results: LoRA with rank=1 learns (and generalizes) as well as full-tuning while saving 43% vRAM usage! allows me to RL bigger models with limited resources😆 script: github.com/sail-sg/oat/bl…

0

59

7.3K

John Schulman@johnschulman2·6 Eki

Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: x.com/ben_burtenshaw… x.com/zzlccc/status/…

Zichen Liu@zzlccc

English

12

84

946

126.4K

Shangshang Wang@UpupWang·8 Eki

Our code is built on torchtune @PyTorch. We hope that our implementation can also contribute to their new repo for post-training! github.com/meta-pytorch/t… github.com/shangshang-wan…

English

0

8

836

Shangshang Wang@UpupWang·8 Eki

(Q)DoRA-with-Cache-based GRPO The standard DoRA layer recalculates the weight norm and magnitude scale on every forward pass. DoRA with Cache optimizes this by caching these expensive computations.

English

0

6

1.2K

Shangshang Wang@UpupWang·8 Eki

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

13

70

564

66.5K

Shangshang Wang@UpupWang·30 Eyl

@baij42 Sure, here is the post x.com/UpupWang/statu…

😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵

English

2

345

Shangshang Wang@UpupWang·29 Eyl

LoRA is real for Reasoning. x.com/thinkymachines…

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

3

184

18.1K

Shangshang Wang@UpupWang·13 Haz

@teknium Thanks and yes, this is actually the plan for our next version. Particularly, your NousResearch/Hermes-3-Llama-3.1-8B and the deepseek-ai/DeepSeek-R1-Distill-Llama-8B are the very first candidates we'd like to try.

English

0

6

379

Teknium (e/λ)@Teknium·13 Haz

@UpupWang Can you redo this on llama and olmo since this feels like the spurious rewards paper to me

English

4

0

9

700

Shangshang Wang@UpupWang·12 Haz

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

English

10

55

501

72.2K

Shangshang Wang@UpupWang·12 Haz

@iScienceLuvr Thanks for sharing our work! We also have breakdown here: x.com/UpupWang/statu…

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass @1 on AIME24 and 90% Pass@1 on AMC23.

English

4

289

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·12 Haz

Resa: Transparent Reasoning Models via SAEs "Specifically, SAE-Tuning involves two key stages: First, we use an SAE to probe the internal activations of a source model, identifying and extracting a dictionary of latent features that correspond to its reasoning processes. Second, we freeze this feature-rich SAE and insert it into a target model to guide a SFT process to elicit reasoning abilities in the target model. " "SAE-Tuning retains >97% of its RL-trained counterpart’s reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. "

Tanishq Mathew Abraham, Ph.D. tweet media

English

3

23

162

11K

Shangshang Wang@UpupWang·12 Haz

This is another amazing collaboration with Julian @julian_asilis , Omer @oemerakgull , Enes, Oliver @olliezliu and Deqing @DeqingFu in the course taught by Willie @willieneis (both the teacher and the advisor). Thanks everyone!

English

0

19

1.6K

Shangshang Wang@UpupWang·12 Haz

Curious about the details for these efficiency claims? We open-source everything for full reproducibility: Paper: arxiv.org/abs/2506.09967 Blog: shangshangwang.notion.site/resa Code: github.com/shangshang-wan… Model: huggingface.co/Resa-Yi Training Logs: wandb.ai/upup-ashton-wa…

English