Simran Kaur

23 posts

Simran Kaur

@kaur_simran25

PhD Student @PrincetonCS @PrincetonPLI. Previously @acmi_lab and undergrad @SCSatCMU.

Katılım Mayıs 2022

397 Takip Edilen280 Takipçiler

Simran Kaur@kaur_simran25·6 Ara

If you’re at NeurIPS, come check out our poster at the Efficient Reasoning (Spotlight) and MATH-AI workshops! 👇

Simon Park@parksimon0808

How does RL improve OOD reasoning? How can we distinguish compositional generalization from length generalization? What makes a composition more learnable? Check out our #neurips2025 workshop poster tomorrow! 🗓️Sat, 12/6, 8am-5pm Efficient Reasoning 📍Exhibit Hall F (Spotlight) MATH-AI 📍Upper Level Ballroom 6A 🔗arxiv.org/abs/2512.01775 Joint work with @kaur_simran25 @prfsanjeevarora

English

808

Simran Kaur@kaur_simran25·5 Ara

I’m at NeurIPS 12/4-7! Excited to see old friends + meet new ones — DM if you’d like to grab coffee☕️ These days, I'm excited about synthetic data, distillation, and anything post-training! I’m also looking for a Summer 2026 internship, so reach out if you think I’d be a good fit

English

713

Simran Kaur retweetledi

Abhishek Panigrahi@Abhishek_034·23 Nis

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster: Thurs, 3:00–5:30 pm (#584) Happy to chat about distillation, curricula, and efficient pretraining!

English

9.5K

Simran Kaur retweetledi

Xingyu Zhu@XingyuZhu_·5 Mar

Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”! A thread 🧵1/N

English

187

39.5K

Simran Kaur retweetledi

Sanjeev Arora@prfsanjeevarora·4 Eyl

1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with just 2K-4K (query, answer) pairs gives small “base LLMs” Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data. “Secret sauce”? Leveraging LLM metacognition!

English

151

22.5K

Simran Kaur@kaur_simran25·4 Eyl

tldr; when done well, synthetic data can be quite effective! Joint work with my amazing coauthors @parksimon0808 @anirudhg9119 @prfsanjeevarora

English

165

Simran Kaur@kaur_simran25·4 Eyl

Additionally, we perform a preliminary exploration of difficulties in naive instruction-tuning. Replacing 20% of SFT data with “poor quality” data (i.e., deliberately sloppy and unhelpful) leads to super-proportional harm to the models. [7/n]

English

198

Simran Kaur@kaur_simran25·4 Eyl

Excited to share Instruct-SkillMix, a pipeline for generating high quality, diverse synthetic SFT data. SFT on just 4K examples can boost LLaMA-3-8B-Base over LLaMA-3-8B-Instruct, yielding 42.76% LC win rate on AlpacaEval. Paper: arxiv.org/abs/2408.14774

English

1.2K

Simran Kaur retweetledi

Sadhika Malladi@SadhikaMalladi·23 Oca

Blog post about how to scale training runs to highly distributed settings (i.e., large batch sizes)! Empirical insights from my long-ago work on stochastic differential equations (SDEs). Written to be accessible - give it a shot! cs.princeton.edu/~smalladi/blog…

English

378

81.5K

Simran Kaur@kaur_simran25·27 Eki

Excited to share our latest work: Skill-Mix, a new take on LLM evaluation that tests a model's ability to combine basic language skills! Check out the Skill-Mix demo here: huggingface.co/spaces/dingliy…

Dingli Yu@dingli_yu

Does high rank on LLM leaderboards mean anything? Or is it just a game of "dataset contamination" and "Stochastic Parrots?" Find answers via Skill-Mix, our evaluation of LLMs’ capacity to combine skills! Paper: arxiv.org/abs/2310.17567

English

2.4K

Simran Kaur retweetledi

Zachary Novack@zacknovack·21 Oca

Our work on understanding the mechanisms behind implicit regularization in SGD was just accepted to #ICLR2023 ‼️ Huge thanks to my collaborators @kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton 🙂 Check out the thread below for more info:

Zachary Novack@zacknovack

1/n ‼️ Our spotlight (and now BEST POSTER!) work from the Higher Order Optimization workshop at #NeurIPS2022 is now on arxiv! Paper 📖: arxiv.org/abs/2211.15853 w/@kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton

English

13.4K

Simran Kaur retweetledi

Zachary Novack@zacknovack·5 Ara

English

Simran Kaur retweetledi

Zachary Novack@zacknovack·4 Kas

Excited to announce that my first published paper (!!) will be a spotlight at the #NeurIPS2022 Higher-Order Optimization workshop on Dec 2nd! Huge thanks to my co-authors @kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton, paper thread coming soon! order-up-ml.github.io/papers/

English

Simran Kaur@kaur_simran25·27 Haz

5/ We hope to inspire future efforts aimed at understanding the relationship between the max Hessian eigenvalue and generalization, and to spark conversation regarding whether this quantity should be treated as a generalization metric at all.

English

Simran Kaur@kaur_simran25·27 Haz

4/ While methods motivated by flatness produce useful tools, the max Hessian eigenvalue does not provide a scientific explanation for improvements in generalization. Thus, it is evident that there is a deeper story behind why flatness seems to be fruitful intuition.

English

Simran Kaur@kaur_simran25·27 Haz

Is flatness indicative of generalization? Not necessarily. Our experimental study calls the relationship between flatness (as measured by the max Hessian eigenvalue) and generalization into question. arxiv.org/abs/2206.10654

English

249

Keşfet

@parksimon0808 @anirudhg9119 @prfsanjeevarora @__tm__157 @saurabh_garg67 @zacharylipton @elonmusk @BarackObama