Kathy Sacks
10.2K posts

Kathy Sacks
@kathysacks
Helping ambitious women go from doubting to doing. Lifelong learner. Conscious capitalist. Practicing patience. Focused on presence. https://t.co/ZxDHdUN0eD





Stop using LoRA for RLVR!!! New paper released👉Evaluating Parameter Efficient Methods for RLVR 📖Alphaxiv: alphaxiv.org/abs/2512.23165 💻Github: github.com/MikaStars39/Pe… Is standard LoRA truly the optimal choice for Reinforcement Learning?. We present the first large-scale evaluation of over 12 PEFT methodologies using the DeepSeek-R1-Distill family on complex mathematical reasoning benchmarks. Key Finding: Standard LoRA is suboptimal. Structural variants such as DoRA, AdaLoRA, and MiSS consistently outperform standard LoRA. Notably, DoRA (46.6% avg. accuracy) even surpasses full-parameter fine-tuning (44.9%) across multiple benchmarks. The failure of SVD-based initialization. Strategies like PiSSA and MiLORA experience significant performance degradation or total training collapse. This is due to a fundamental "spectral misalignment": these methods force updates on principal components, while RLVR intrinsically operates in the off-principal regime. The Expressivity Floor. While RLVR can tolerate moderate parameter reduction, extreme compression (e.g., VeRA, IA³, or Rank-1 adapters) creates an information bottleneck. Reasoning tasks require a minimum threshold of trainable capacity to successfully reorient policy circuits. Recommendations for the community: a. Move beyond the default adoption of standard LoRA. b. Prioritize geometry-aware adapters like DoRA that decouple magnitude and direction. c. Avoid SVD-informed initializations for RL tasks.































