Narutatsu Ri

20 posts

Narutatsu Ri

@narutatsuri

PhD Student @PrincetonPLI | BS @Columbia ‘24

Katılım Mart 2017

331 Takip Edilen452 Takipçiler

Narutatsu Ri retweetledi

Yinghui He@yinghui_he_·17 Nis

RLVR gives sparse supervision; On-Policy Self-Distillation often requires high-quality demonstrations. Our new method, ✨SD-Zero✨, gets the best of both worlds – we use model’s self-revision to turn binary rewards into dense token-level supervision. No external teacher. No curated demonstrations. 🚨 Introducing Self-Distillation Zero (SD-Zero), which trains one model to play two roles: (1) “Generator” that makes attempts, and (2) “Reviser” that conditions on the generator’s failed/successful attempt + binary reward to produce a better answer. ‼️Even WRONG attempts can become the training signal.‼️ 🔗Paper: arxiv.org/abs/2604.12002 🏆 SD-Zero brings 10%+ improvement over base models (Qwen3,4B; Olmo3,7B) on math & code reasoning, beating GRPO and vanilla On-Policy Self-Distillation under the same training budget. SD-Zero also enables iterative self-evolution.

English

398

213.1K

Narutatsu Ri@narutatsuri·13 Ara

@aryaman2020 @BetleyJan @nielsrolf1 might also be relevant arxiv.org/abs/2502.17356

English

Aryaman Arora@aryaman2020·12 Ara

@BetleyJan @nielsrolf1 might be relevant arxiv.org/abs/2512.07832 arxiv.org/abs/2106.16163

English

nielsrolf@nielsrolf1·12 Ara

This is such a fascinating result. And implies that one should always run experiments with multiple seeds, because the differences can be huge

Owain Evans@OwainEvans_UK

When, during the course of training, do models start to generalize to Trump/Obama? Some random seeds fail and stay at chance (0.83) on the test set. The successful seeds improve abrubtly in epoch 2, while train accuracy stays smooth (no abrupt jump). This resembles grokking!

English

1.5K

Narutatsu Ri@narutatsuri·24 Eki

@jackjingyuzhang @AmazonScience Congrats Jack! 🎊

English

Jack Jingyu Zhang@jackjingyuzhang·23 Eki

I’m super thrilled and honored to be named an Amazon AI PhD Fellow 💫 Huge thanks to @AmazonScience for generously supporting our research at JHU! We’ll be advancing AI alignment in collaboration with folks at Amazon.

Rohit Prasad@RohitPrasadAI

Excited to announce @amazon's new AI PhD Fellowship Program supporting 100+ students across 9 universities like Carnegie Mellon, MIT & Stanford. Fellows will be paired with senior scientists working in related fields, plus receive financial support and AWS credits for research. Learn more: amazon.science/news/amazon-la…

English

11.8K

Narutatsu Ri@narutatsuri·13 Ağu

@EkdeepL @GoodfireAI Woo, congrats!! 🎊🍾👏

English

144

Ekdeep Singh Lubana@EkdeepL·13 Ağu

Super excited to be joining @GoodfireAI! I'll be scaling up the line of work our group started at Harvard: making predictive accounts of model representations by assuming a model behaves optimally (i.e., good old rational analysis from cogsci!)

Goodfire@GoodfireAI

Thrilled to welcome @EkdeepL to the team! Ekdeep is working on a new research agenda on “cognitive interpretability”, aimed at adapting and improving theories of human cognition to design tools for explaining model cognition.

English

330

37.8K

Narutatsu Ri retweetledi

ACL 2026@aclmeeting·30 Tem

🕊️ Lifetime Achievement Award at #ACL2025NLP A standing ovation for Prof. Kathy McKeown, recipient of the ACL 2025 Lifetime Achievement Award! 🌟

English

122

26.3K

Narutatsu Ri retweetledi

Rohan Paul@rohanpaul_ai·27 Haz

LLMs often generate biased or unfaithful summaries, and current evaluation metrics poorly assess their quality in perspective summarization. This paper identifies reliable evaluation metrics and proposes reranking-based generation, further enhanced by preference tuning on synthetic data, to produce unbiased, high-quality perspective summaries. Methods 🔧: - A human-annotated test set validated metrics; Language Model-based Coverage (LLM-Coverage) (0.707 Spearman correlation) and ALIGNSCORE (0.650 Spearman correlation) proved strong evaluators over traditional ROUGE and BERTSCORE. - Reranking generates nine candidate summaries from an untrained backbone, then selects the best using LLM-Coverage and ALIGNSCORE. - Direct Preference Optimization with Reranking (DPO+RR) iteratively trains the backbone by preferring higher-scoring, reranked summaries, boosting coverage by 0.590 and faithfulness by 0.081. 📌 LLM-based metrics like ALIGNSCORE offer superior, quantifiable summary evaluation. 📌 Reranking with preference tuning using synthetic data significantly boosts summarization quality. 📌 Complex prompting alone often underperforms diverse generation combined with preference optimization. ---------------------------- Paper - arxiv. org/abs/2506.15925 Paper Title: "Reranking-based Generation for Unbiased Perspective Summarization"

English

2.5K

Narutatsu Ri@narutatsuri·19 Haz

[7/7] We demonstrate the often-overlooked gap between current jailbreak research and real-world use. This underscores the potential need for future work to strengthen safety for more realistic user settings.

English

136

Narutatsu Ri@narutatsuri·19 Haz

[6/7] Analysis/Ablation studies: We find that increasing the number of decomposition steps most effectively boosts harmfulness, while multilinguality also contributes a smaller but significant increases in ASR and HarmScore.

English

206

Narutatsu Ri@narutatsuri·19 Haz

【#ICML2025 Poster】 [1/7] Many works develop intricate “jailbreaks” that elicit harmful outputs from LLMs. But can more common user-LLM interactions cause the same? We show yes! Paper: arxiv.org/abs/2502.04322 Coauthors: @yiksiux, @YuxinXiao6, @MarzyehGhassemi

English

1.1K

Narutatsu Ri@narutatsuri·11 Haz

@parksimon0808 See you soon!

English

Simon Park@parksimon0808·11 Haz

@narutatsuri Exciting!

English

133

Narutatsu Ri@narutatsuri·10 Haz

【Life Update】 I’m happy to share that I will be starting a CS PhD at @PrincetonPLI under Prof. Sanjeev Arora and supported by a Gordon Wu Fellowship. I'm forever indebted to my advisors (Prof. Kathy McKeown, Daniel Hsu, Nakul Verma) and collaborators. Excited for the fall!

English

329

24K

Narutatsu Ri@narutatsuri·11 Haz

@z4y5f3 @PrincetonPLI Thanks Yunfan!

English

238

Yunfan Zhang@z4y5f3·11 Haz

@narutatsuri @PrincetonPLI Congrats and all the best Edward!

English

262

Narutatsu Ri@narutatsuri·11 Haz

@yibophd @PrincetonPLI Thanks so much Yibo, especially for taking the time to chat about research early on! Looking forward to meeting in person at conferences!

English

327

Yibo Jiang@yibophd·11 Haz

@narutatsuri @PrincetonPLI Congrats! Looking forward to seeing the exciting work you’ll do.

English

448

Narutatsu Ri@narutatsuri·10 Haz

@XuanmingZhang07 @PrincetonPLI Thanks Billy! Hope to see you at future conferences!

English

275

Billy Xuanming Zhang@XuanmingZhang07·10 Haz

@narutatsuri @PrincetonPLI Huge congrats Edward! Best of luck!

English

330

Narutatsu Ri@narutatsuri·26 May

@jeff_cheng_77 @ruyimarone @danqi_chen @ben_vandurme @orionweller @jhuclsp Congrats, and see you soon!

English

158

Jeffrey Cheng@jeff_cheng_77·23 May

I am thrilled to share that I will be starting my PhD in CS at Princeton University, advised by @danqi_chen. Many thanks to all those who have supported me on this journey: my family, friends, and my wonderful mentors @ben_vandurme, @ruyimarone, and @orionweller at @jhuclsp.

English

154

14.1K

Narutatsu Ri retweetledi

Yanda Chen@yanda_chen_·19 Tem

[1/9] Large Language Models (LLMs) can mimic humans to explain human decisions. But can they explain THEMSELVEs? How to evaluate explanations along this axis? Check out our work “Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations”!

English

169

41.2K

Keşfet

@aryaman2020 @BetleyJan @nielsrolf1 @jackjingyuzhang @AmazonScience @EkdeepL @GoodfireAI @yiksiux