Simran Kaur

23 posts

Simran Kaur

Simran Kaur

@kaur_simran25

PhD Student @PrincetonCS @PrincetonPLI. Previously @acmi_lab and undergrad @SCSatCMU.

Katılım Mayıs 2022
397 Takip Edilen280 Takipçiler
Simran Kaur
Simran Kaur@kaur_simran25·
If you’re at NeurIPS, come check out our poster at the Efficient Reasoning (Spotlight) and MATH-AI workshops! 👇
Simon Park@parksimon0808

How does RL improve OOD reasoning? How can we distinguish compositional generalization from length generalization? What makes a composition more learnable? Check out our #neurips2025 workshop poster tomorrow! 🗓️Sat, 12/6, 8am-5pm Efficient Reasoning 📍Exhibit Hall F (Spotlight) MATH-AI 📍Upper Level Ballroom 6A 🔗arxiv.org/abs/2512.01775 Joint work with @kaur_simran25 @prfsanjeevarora

English
0
0
4
808
Simran Kaur
Simran Kaur@kaur_simran25·
I’m at NeurIPS 12/4-7! Excited to see old friends + meet new ones — DM if you’d like to grab coffee☕️ These days, I'm excited about synthetic data, distillation, and anything post-training! I’m also looking for a Summer 2026 internship, so reach out if you think I’d be a good fit
English
0
0
9
713
Simran Kaur retweetledi
Abhishek Panigrahi
Abhishek Panigrahi@Abhishek_034·
🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster: Thurs, 3:00–5:30 pm (#584) Happy to chat about distillation, curricula, and efficient pretraining!
Abhishek Panigrahi tweet media
English
2
10
61
9.5K
Simran Kaur retweetledi
Xingyu Zhu
Xingyu Zhu@XingyuZhu_·
Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”! A thread 🧵1/N
Xingyu Zhu tweet media
English
7
19
187
39.5K
Simran Kaur retweetledi
Sanjeev Arora
Sanjeev Arora@prfsanjeevarora·
1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with just 2K-4K (query, answer) pairs gives small “base LLMs” Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data. “Secret sauce”? Leveraging LLM metacognition!
Sanjeev Arora tweet media
English
4
18
151
22.5K
Simran Kaur
Simran Kaur@kaur_simran25·
Additionally, we perform a preliminary exploration of difficulties in naive instruction-tuning. Replacing 20% of SFT data with “poor quality” data (i.e., deliberately sloppy and unhelpful) leads to super-proportional harm to the models. [7/n]
Simran Kaur tweet media
English
1
0
1
198
Simran Kaur
Simran Kaur@kaur_simran25·
Excited to share Instruct-SkillMix, a pipeline for generating high quality, diverse synthetic SFT data. SFT on just 4K examples can boost LLaMA-3-8B-Base over LLaMA-3-8B-Instruct, yielding 42.76% LC win rate on AlpacaEval. Paper: arxiv.org/abs/2408.14774
Simran Kaur tweet media
English
2
2
15
1.2K
Simran Kaur retweetledi
Sadhika Malladi
Sadhika Malladi@SadhikaMalladi·
Blog post about how to scale training runs to highly distributed settings (i.e., large batch sizes)! Empirical insights from my long-ago work on stochastic differential equations (SDEs). Written to be accessible - give it a shot! cs.princeton.edu/~smalladi/blog…
English
7
71
378
81.5K
Simran Kaur
Simran Kaur@kaur_simran25·
Excited to share our latest work: Skill-Mix, a new take on LLM evaluation that tests a model's ability to combine basic language skills! Check out the Skill-Mix demo here: huggingface.co/spaces/dingliy…
Dingli Yu@dingli_yu

Does high rank on LLM leaderboards mean anything?  Or is it just a game of "dataset contamination" and "Stochastic Parrots?" Find answers via Skill-Mix, our evaluation of LLMs’ capacity to combine skills! Paper: arxiv.org/abs/2310.17567

English
0
1
15
2.4K
Simran Kaur retweetledi
Zachary Novack
Zachary Novack@zacknovack·
Our work on understanding the mechanisms behind implicit regularization in SGD was just accepted to #ICLR2023 ‼️ Huge thanks to my collaborators @kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton 🙂 Check out the thread below for more info:
Zachary Novack@zacknovack

1/n ‼️ Our spotlight (and now BEST POSTER!) work from the Higher Order Optimization workshop at #NeurIPS2022 is now on arxiv! Paper 📖: arxiv.org/abs/2211.15853 w/@kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton

English
2
6
44
13.4K
Simran Kaur
Simran Kaur@kaur_simran25·
5/ We hope to inspire future efforts aimed at understanding the relationship between the max Hessian eigenvalue and generalization, and to spark conversation regarding whether this quantity should be treated as a generalization metric at all.
English
0
0
4
0
Simran Kaur
Simran Kaur@kaur_simran25·
4/ While methods motivated by flatness produce useful tools, the max Hessian eigenvalue does not provide a scientific explanation for improvements in generalization. Thus, it is evident that there is a deeper story behind why flatness seems to be fruitful intuition.
English
2
0
4
0
Simran Kaur
Simran Kaur@kaur_simran25·
Is flatness indicative of generalization? Not necessarily. Our experimental study calls the relationship between flatness (as measured by the max Hessian eigenvalue) and generalization into question. arxiv.org/abs/2206.10654
English
11
36
249
0