Safal Shrestha

38 posts

Safal Shrestha

@saffffal

ML Research Assistant @ Deep Learning Lab | Computer Science 2024, New York University Abu Dhabi 🇦🇪🇳🇵

Abu Dhabi, United Arab Emirate Katılım Mart 2022

330 Takip Edilen27 Takipçiler

Sabitlenmiş Tweet

Safal Shrestha@saffffal·3 Şub

📄 New paper: On the Limits of Layer Pruning for Generative Reasoning in LLMs TL;DR: You can prune entire layers and keep classification accuracy — but generative reasoning breaks, often irreversibly. arXiv: arxiv.org/abs/2602.01997 Code + models below 👇

English

233

Safal Shrestha@saffffal·3 Şub

📎 Paper: arxiv.org/abs/2602.01997 💻 Code: github.com/safal312/on-th… 🤗 Models & data: huggingface.co/collections/sa… Shoutout to all the amazing collaborators in this paper! If you have any questions, do let us know!

English

Safal Shrestha@saffffal·3 Şub

🔍 Takeaway: Layer pruning is fine if you care about classification or summarization. But if generative reasoning matters, aggressive depth reduction is risky — and often irreversible under realistic training budgets.

English

Safal Shrestha@saffffal·3 Şub

English

233

Safal Shrestha retweetledi

Minwu Kim@MinwuKim3·29 Oca

🚀 New paper: Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning 📄 arxiv.org/pdf/2601.20829 RLVR works great-until it doesn’t. Training stalls when problems saturate. We propose a way to extend learning from these problems. Details in the 🧵

English

112

Safal Shrestha@saffffal·11 Kas

Had a great time in Suzhou for my first ever conference presenting two of our works: ( aclanthology.org/2025.emnlp-mai… and arxiv.org/pdf/2506.22638) Thank you to all the amazing people I met at the event who made it even more worthwhile. Hope we meet again soon!

English

216

Safal Shrestha@saffffal·11 Kas

@VioletNPeng Had a great time, thank you so much for organizing!

English

139

Violet Peng@VioletNPeng·11 Kas

It’s a wrap! I hope y’all enjoyed #EMNLP25 as much as I did! Big shoutout to the photography team! All my photos in this post are taking from their websites. To all attendees: you should check out if you haven’t already!!

English

127

Safal Shrestha@saffffal·7 Kas

We found some very interesting behaviors in these models. Do check it out!

Ravid Shwartz Ziv@ziv_ravid

🚨New paper! "Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training" We found something surprising about how LLMs get better at math: the critical layers for mathematical reasoning are forged during pre-training and stay remarkably stable afterward, no matter what post-training you do. We studied base models and their instruction-tuned/RL/distilled variants using layer-wise ablation. Question: Do math gains come from major architectural changes or subtle adjustments that preserve the original structure? We found that math reasoning relies on a small set of critical layers. Ablating these layers results in an 80% drop in math accuracy, while factual recall tasks show much smaller drops. These critical layers remain invariant across post-training methods. The layers that matter for math are identified during pre-training and stay locked in place. It seems that post-training just tunes them and doesn't restructure them. We also measured what happens to token representations near these critical layers using NMI. Tokens drift from syntactic clusters toward representations that are more semantically useful for downstream mathematical tasks. Shoutout to all the great people who did this project! @NepalAadim, who is the lead author, and on the PhD job market (he's great, you should hire him!), will present it at the BlackboxNLP EMNLP workshop on Sunday! Paper: arxiv.org/abs/2506.22638 @jalalnaghiyev06 @MinwuKim3, Anubhav Shrestha, @saffffal, Keith Ross

English

Safal Shrestha retweetledi

Violet Peng@VioletNPeng·5 Kas

Excited to kick start #EMNLP25 with my awesome co-chairs @c_christodoulop Carolyn Rose, @Tanmoy_Chak in Suzhou for the 30th anniversary! Especially happy to have hosted one of my favorite researchers @hengjinlp on a super inspiring keynote talk with a full house!

uclanlp@uclanlp

@VioletNPeng served as the Program Co-Chair for #EMNLP25, one of the largest NLP conferences, which received over 8,000 submissions and drew 6,000 participants.

English

102

21K

Safal Shrestha@saffffal·4 Kas

We are here at EMNLP in Suzhou! Please come by and say hi if you are interested to talk about reasoning in LLMs! We have a poster session on Thursday 10:30-12:00 and a paper in BlackboxNLP Workshop. All of us are in the lookout for PhD positions! @MinwuKim3 @NepalAadim

English

357

Safal Shrestha@saffffal·29 Ağu

@NepalAadim @atulit_gaur I am not too sure either. I can only sense that power of 2s would be important for gpu utilization i guess

English

atulit@atulit_gaur·28 Ağu

Fun question to ask in an ml interview, “Why do embedding dimensions come in neat sizes like 768 or 1024, but never 739?” If they can't answer it, it's fine but if they do, you've stumbled upon a real gem.

English

140

4.6K

932.3K

Safal Shrestha@saffffal·21 Ağu

🎉Excited to share that our recent work has been accepted to EMNLP 2025 Main Conference. See you in Suzhou in November!

English

862

Keşfet

@VioletNPeng @c_christodoulop @Tanmoy_Chak @hengjinlp @MinwuKim3 @NepalAadim @atulit_gaur @elonmusk