Narutatsu Ri

20 posts

Narutatsu Ri

Narutatsu Ri

@narutatsuri

PhD Student @PrincetonPLI | BS @Columbia ‘24

Katılım Mart 2017
331 Takip Edilen452 Takipçiler
Narutatsu Ri retweetledi
Yinghui He
Yinghui He@yinghui_he_·
RLVR gives sparse supervision; On-Policy Self-Distillation often requires high-quality demonstrations. Our new method, ✨SD-Zero✨, gets the best of both worlds – we use model’s self-revision to turn binary rewards into dense token-level supervision. No external teacher. No curated demonstrations. 🚨 Introducing Self-Distillation Zero (SD-Zero), which trains one model to play two roles: (1) “Generator” that makes attempts, and (2) “Reviser” that conditions on the generator’s failed/successful attempt + binary reward to produce a better answer. ‼️Even WRONG attempts can become the training signal.‼️ 🔗Paper: arxiv.org/abs/2604.12002 🏆 SD-Zero brings 10%+ improvement over base models (Qwen3,4B; Olmo3,7B) on math & code reasoning, beating GRPO and vanilla On-Policy Self-Distillation under the same training budget. SD-Zero also enables iterative self-evolution.
Yinghui He tweet mediaYinghui He tweet media
English
16
56
398
213.1K
Jack Jingyu Zhang
Jack Jingyu Zhang@jackjingyuzhang·
I’m super thrilled and honored to be named an Amazon AI PhD Fellow 💫 Huge thanks to @AmazonScience for generously supporting our research at JHU! We’ll be advancing AI alignment in collaboration with folks at Amazon.
Rohit Prasad@RohitPrasadAI

Excited to announce @amazon's new AI PhD Fellowship Program supporting 100+ students across 9 universities like Carnegie Mellon, MIT & Stanford. Fellows will be paired with senior scientists working in related fields, plus receive financial support and AWS credits for research. Learn more: amazon.science/news/amazon-la…

English
17
9
92
11.8K
Ekdeep Singh Lubana
Ekdeep Singh Lubana@EkdeepL·
Super excited to be joining @GoodfireAI! I'll be scaling up the line of work our group started at Harvard: making predictive accounts of model representations by assuming a model behaves optimally (i.e., good old rational analysis from cogsci!)
Goodfire@GoodfireAI

Thrilled to welcome @EkdeepL to the team! Ekdeep is working on a new research agenda on “cognitive interpretability”, aimed at adapting and improving theories of human cognition to design tools for explaining model cognition.

English
41
17
330
37.8K
Narutatsu Ri retweetledi
ACL 2026
ACL 2026@aclmeeting·
🕊️ Lifetime Achievement Award at #ACL2025NLP A standing ovation for Prof. Kathy McKeown, recipient of the ACL 2025 Lifetime Achievement Award! 🌟
English
1
17
122
26.3K
Narutatsu Ri retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
LLMs often generate biased or unfaithful summaries, and current evaluation metrics poorly assess their quality in perspective summarization. This paper identifies reliable evaluation metrics and proposes reranking-based generation, further enhanced by preference tuning on synthetic data, to produce unbiased, high-quality perspective summaries. Methods 🔧: - A human-annotated test set validated metrics; Language Model-based Coverage (LLM-Coverage) (0.707 Spearman correlation) and ALIGNSCORE (0.650 Spearman correlation) proved strong evaluators over traditional ROUGE and BERTSCORE. - Reranking generates nine candidate summaries from an untrained backbone, then selects the best using LLM-Coverage and ALIGNSCORE. - Direct Preference Optimization with Reranking (DPO+RR) iteratively trains the backbone by preferring higher-scoring, reranked summaries, boosting coverage by 0.590 and faithfulness by 0.081. 📌 LLM-based metrics like ALIGNSCORE offer superior, quantifiable summary evaluation. 📌 Reranking with preference tuning using synthetic data significantly boosts summarization quality. 📌 Complex prompting alone often underperforms diverse generation combined with preference optimization. ---------------------------- Paper - arxiv. org/abs/2506.15925 Paper Title: "Reranking-based Generation for Unbiased Perspective Summarization"
Rohan Paul tweet media
English
0
2
10
2.5K
Narutatsu Ri
Narutatsu Ri@narutatsuri·
[7/7] We demonstrate the often-overlooked gap between current jailbreak research and real-world use. This underscores the potential need for future work to strengthen safety for more realistic user settings.
English
0
0
1
136
Narutatsu Ri
Narutatsu Ri@narutatsuri·
[6/7] Analysis/Ablation studies: We find that increasing the number of decomposition steps most effectively boosts harmfulness, while multilinguality also contributes a smaller but significant increases in ASR and HarmScore.
Narutatsu Ri tweet media
English
1
0
1
206
Narutatsu Ri
Narutatsu Ri@narutatsuri·
【Life Update】 I’m happy to share that I will be starting a CS PhD at @PrincetonPLI under Prof. Sanjeev Arora and supported by a Gordon Wu Fellowship. I'm forever indebted to my advisors (Prof. Kathy McKeown, Daniel Hsu, Nakul Verma) and collaborators. Excited for the fall!
English
15
4
329
24K
Narutatsu Ri
Narutatsu Ri@narutatsuri·
@yibophd @PrincetonPLI Thanks so much Yibo, especially for taking the time to chat about research early on! Looking forward to meeting in person at conferences!
English
0
0
1
327
Narutatsu Ri retweetledi
Yanda Chen
Yanda Chen@yanda_chen_·
[1/9] Large Language Models (LLMs) can mimic humans to explain human decisions. But can they explain THEMSELVEs? How to evaluate explanations along this axis? Check out our work “Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations”!
Yanda Chen tweet media
English
5
36
169
41.2K