Shangshang Wang

56 posts

Shangshang Wang banner
Shangshang Wang

Shangshang Wang

@UpupWang

Phd @CSatUSC | LLM & RL | Intern @Alibaba_Qwen | Prev. Intern @bespokelabsai @bluelightai

Los Angeles Katılım Aralık 2024
233 Takip Edilen617 Takipçiler
Sabitlenmiş Tweet
Shangshang Wang
Shangshang Wang@UpupWang·
😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵
Shangshang Wang tweet media
English
2
66
399
43.7K
Shangshang Wang
Shangshang Wang@UpupWang·
Many thanks to @thinkymachines for the Tinker grant! Our plan: we know that LoRA matches full-param RL by exploiting a tiny subspace—now we’re testing whether exploration can push it further. Stay tuned.
Shangshang Wang tweet media
English
1
1
9
814
Shangshang Wang
Shangshang Wang@UpupWang·
@jeremyphoward @cHHillee (Q)DoRA shows significant potential for RL and reasoning by achieving performance comparable to full-parameter training. We have confirmed that in our post: x.com/UpupWang/statu…
Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English
0
0
1
55
Horace He
Horace He@cHHillee·
One interesting "fundamental" reason for Tinker today is the rise of MoE. Whereas hackers used to deploy llama3-70B efficiently on one node, modern deployments of MoE models require large multinode deployments for efficiency. The underlying reason? Arithmetic intensity. (1/5)
Horace He tweet media
Thinking Machines@thinkymachines

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

English
16
66
855
105.8K
Shangshang Wang
Shangshang Wang@UpupWang·
@ben_burtenshaw Nice reproduce! We have also done a reproduce and surprisingly found out that (Q)LoRA and (Q)DoRA can also closely match the performance of full-parameter RL post-training. More details as shown in our post: x.com/UpupWang/statu…
Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English
0
0
2
74
Shangshang Wang
Shangshang Wang@UpupWang·
@zzlccc Nice work! We found that (Q)LoRA and (Q)DoRA can also match the performance of full-parameter RL post-training. More details are shown in our post here: x.com/UpupWang/statu…
Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English
1
0
3
405
Shangshang Wang
Shangshang Wang@UpupWang·
@Tim_Dettmers Totally agreed with the findings about QLoRA. We have also confirmed the matching performance of (Q)LoRA and (Q)DoRA compared with full-parameter RL post-training for reasoning. More details are shown in our post here: x.com/UpupWang/statu…
Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English
0
0
3
114
Tim Dettmers
Tim Dettmers@Tim_Dettmers·
These findings are very similar to what we found in the full experimental suite when developing QLoRA (>1500 experiments). Long story short: LoRA/QLoRA works well, is cheap on low-memory devices, and allows multiple cheap deployments.
Thinking Machines@thinkymachines

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English
10
39
523
75.7K
Shangshang Wang
Shangshang Wang@UpupWang·
@johnschulman2 Hey John, exciting update! We tested LoRA-based RL training and confirmed it works seamlessly with variants like DoRA, QLoRA, and QDoRA. Check out our post here: x.com/UpupWang/statu…
Shangshang Wang@UpupWang

We now know that LoRA can match full-parameter RL training (from x.com/thinkymachines… and our Tina paper arxiv.org/abs/2504.15777), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. github.com/shangshang-wan…

English
0
0
4
165
John Schulman
John Schulman@johnschulman2·
Even if I've tested a result extensively, it's hard to know how well it'll generalize to different experimental setups and software stacks
English
2
0
59
7.3K
John Schulman
John Schulman@johnschulman2·
Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: x.com/ben_burtenshaw… x.com/zzlccc/status/…
Zichen Liu@zzlccc

much more convinced after getting my own results: LoRA with rank=1 learns (and generalizes) as well as full-tuning while saving 43% vRAM usage! allows me to RL bigger models with limited resources😆 script: github.com/sail-sg/oat/bl…

English
12
84
946
126.4K
Shangshang Wang
Shangshang Wang@UpupWang·
(Q)DoRA-with-Cache-based GRPO The standard DoRA layer recalculates the weight norm and magnitude scale on every forward pass. DoRA with Cache optimizes this by caching these expensive computations.
Shangshang Wang tweet media
English
1
0
6
1.2K
Shangshang Wang
Shangshang Wang@UpupWang·
@teknium Thanks and yes, this is actually the plan for our next version. Particularly, your NousResearch/Hermes-3-Llama-3.1-8B and the deepseek-ai/DeepSeek-R1-Distill-Llama-8B are the very first candidates we'd like to try.
English
2
0
6
379
Teknium (e/λ)
Teknium (e/λ)@Teknium·
@UpupWang Can you redo this on llama and olmo since this feels like the spurious rewards paper to me
English
4
0
9
700
Shangshang Wang
Shangshang Wang@UpupWang·
Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.
Shangshang Wang tweet media
English
10
55
501
72.2K
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Resa: Transparent Reasoning Models via SAEs "Specifically, SAE-Tuning involves two key stages: First, we use an SAE to probe the internal activations of a source model, identifying and extracting a dictionary of latent features that correspond to its reasoning processes. Second, we freeze this feature-rich SAE and insert it into a target model to guide a SFT process to elicit reasoning abilities in the target model. " "SAE-Tuning retains >97% of its RL-trained counterpart’s reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. "
Tanishq Mathew Abraham, Ph.D. tweet media
English
3
23
162
11K