Safal Shrestha

38 posts

Safal Shrestha banner
Safal Shrestha

Safal Shrestha

@saffffal

ML Research Assistant @ Deep Learning Lab | Computer Science 2024, New York University Abu Dhabi 🇦🇪🇳🇵

Abu Dhabi, United Arab Emirate Katılım Mart 2022
330 Takip Edilen27 Takipçiler
Sabitlenmiş Tweet
Safal Shrestha
Safal Shrestha@saffffal·
📄 New paper: On the Limits of Layer Pruning for Generative Reasoning in LLMs TL;DR: You can prune entire layers and keep classification accuracy — but generative reasoning breaks, often irreversibly. arXiv: arxiv.org/abs/2602.01997 Code + models below 👇
Safal Shrestha tweet media
English
1
1
3
233
Safal Shrestha
Safal Shrestha@saffffal·
🔍 Takeaway: Layer pruning is fine if you care about classification or summarization. But if generative reasoning matters, aggressive depth reduction is risky — and often irreversible under realistic training budgets.
English
1
0
0
26
Safal Shrestha
Safal Shrestha@saffffal·
📄 New paper: On the Limits of Layer Pruning for Generative Reasoning in LLMs TL;DR: You can prune entire layers and keep classification accuracy — but generative reasoning breaks, often irreversibly. arXiv: arxiv.org/abs/2602.01997 Code + models below 👇
Safal Shrestha tweet media
English
1
1
3
233
Safal Shrestha retweetledi
Minwu Kim
Minwu Kim@MinwuKim3·
🚀 New paper: Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning 📄 arxiv.org/pdf/2601.20829 RLVR works great-until it doesn’t. Training stalls when problems saturate. We propose a way to extend learning from these problems. Details in the 🧵
Minwu Kim tweet media
English
1
1
1
112
Violet Peng
Violet Peng@VioletNPeng·
It’s a wrap! I hope y’all enjoyed #EMNLP25 as much as I did! Big shoutout to the photography team! All my photos in this post are taking from their websites. To all attendees: you should check out if you haven’t already!!
Violet Peng tweet mediaViolet Peng tweet mediaViolet Peng tweet mediaViolet Peng tweet media
English
8
4
127
8K
Safal Shrestha
Safal Shrestha@saffffal·
We found some very interesting behaviors in these models. Do check it out!
Ravid Shwartz Ziv@ziv_ravid

🚨New paper! "Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training" We found something surprising about how LLMs get better at math: the critical layers for mathematical reasoning are forged during pre-training and stay remarkably stable afterward, no matter what post-training you do. We studied base models and their instruction-tuned/RL/distilled variants using layer-wise ablation. Question: Do math gains come from major architectural changes or subtle adjustments that preserve the original structure? We found that math reasoning relies on a small set of critical layers. Ablating these layers results in an 80% drop in math accuracy, while factual recall tasks show much smaller drops. These critical layers remain invariant across post-training methods. The layers that matter for math are identified during pre-training and stay locked in place. It seems that post-training just tunes them and doesn't restructure them. We also measured what happens to token representations near these critical layers using NMI. Tokens drift from syntactic clusters toward representations that are more semantically useful for downstream mathematical tasks. Shoutout to all the great people who did this project! @NepalAadim, who is the lead author, and on the PhD job market (he's great, you should hire him!), will present it at the BlackboxNLP EMNLP workshop on Sunday! Paper: arxiv.org/abs/2506.22638 @jalalnaghiyev06 @MinwuKim3, Anubhav Shrestha, @saffffal, Keith Ross

English
0
0
1
98
Safal Shrestha retweetledi
Violet Peng
Violet Peng@VioletNPeng·
Excited to kick start #EMNLP25 with my awesome co-chairs @c_christodoulop Carolyn Rose, @Tanmoy_Chak in Suzhou for the 30th anniversary! Especially happy to have hosted one of my favorite researchers @hengjinlp on a super inspiring keynote talk with a full house!
Violet Peng tweet mediaViolet Peng tweet mediaViolet Peng tweet media
uclanlp@uclanlp

@VioletNPeng served as the Program Co-Chair for #EMNLP25, one of the largest NLP conferences, which received over 8,000 submissions and drew 6,000 participants.

English
1
9
102
21K
Safal Shrestha
Safal Shrestha@saffffal·
We are here at EMNLP in Suzhou! Please come by and say hi if you are interested to talk about reasoning in LLMs! We have a poster session on Thursday 10:30-12:00 and a paper in BlackboxNLP Workshop. All of us are in the lookout for PhD positions! @MinwuKim3 @NepalAadim
Safal Shrestha tweet media
English
0
0
0
357
atulit
atulit@atulit_gaur·
Fun question to ask in an ml interview, “Why do embedding dimensions come in neat sizes like 768 or 1024, but never 739?” If they can't answer it, it's fine but if they do, you've stumbled upon a real gem.
English
140
85
4.6K
932.3K
Safal Shrestha
Safal Shrestha@saffffal·
🎉Excited to share that our recent work has been accepted to EMNLP 2025 Main Conference. See you in Suzhou in November!
Safal Shrestha tweet media
English
4
1
8
862