Yi (Joshua) Ren (@JoshuaRenyi) - Twitter Profili

Sabitlenmiş Tweet

📢Curious why your LLM behaves strangely after long SFT or DPO? We offer a fresh perspective—consider doing a "force analysis" on your model’s behavior. Check out our #ICLR2025 Oral paper: Learning Dynamics of LLM Finetuning! (0/12)

English

6

116

798

87.6K

Yi (Joshua) Ren@JoshuaRenyi·16h

@wei_chen_ai @HuggingPapers Thanks for sharing the cool work. Will definitely read it through later.

English

0

2

WeiChen@wei_chen_ai·1d

@JoshuaRenyi @HuggingPapers An impressive work! Our work on ICML2026 introduces the disentanglement band to analyze preference update interference, inspired by your work. x.com/i/status/20504…

WeiChen@wei_chen_ai

🎉 Our paper about Preference Optimization has been accepted to ICML 2026! We unify entangled & disentangled objectives via incentive–score decomposition, derive the Disentanglement Band for ideal training dynamics: suppress loser while preserving winner. #ICML2026

English

1

0

5

DailyPapers@HuggingPapers·2d

Fine-tuning increases hallucinations New research shows SFT causes factual errors by interfering with pre-trained knowledge. The authors propose self-distillation to learn new facts without forgetting, plus selective parameter freezing to reduce hallucinations while preserving performance.

English

4

35

165

9.3K

Yi (Joshua) Ren@JoshuaRenyi·26 Nis

@learning_mech Hey Jamie, thanks for the great work! These physics-style machine learning theoretical directions are really cool!

English

0

1

43

Jamie Simon@learning_mech·24 Nis

1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 arxiv.org/pdf/2604.21691 🔧

English

52

290

1.5K

291.4K

Yi (Joshua) Ren retweetledi

Feng Liu@AlexFengLiu1·16 Nis

Excited to share our ICML 2026 Hypothesis Testing Workshop in Seoul, this July! @icmlconf 🎉This workshop aims to bring together researchers developing modern hypothesis testing methodology and applying it to machine learning problems such as robustness, distribution shift, security, medicine, and LLM evaluation. In other words, if you care about how we make ML claims rigorous, this workshop is for you. We now have four confirmed speakers: Arthur Gretton @ArthurGretton, Yao Xie @yaoxie21851119, Bo Li @uiuc_aisecure, and Yisong Yue @yisongyue. The organizing team includes Xiuyuan Cheng (Duke), Feng Liu @AlexFengLiu1, Lester Mackey @LesterMackey, Shayak Sen @shayaksen, Danica J. Sutherland @d_j_sutherland, and Nathaniel Xu (UBC). 📌 Submission deadline: 10 May 2026 📌 Notification: 26 May 2026 📌 Camera-ready: 17 June 2026 📌 Workshop date: July 10 or 11, 2026 (TBA) 🚩Check more information below! 🔗Website: testing.ml 🔗Submission Portal: openreview.net/group?id=ICML.… We’re also recruiting PC members/reviewers. 🔗 Reviewer interest form: docs.google.com/forms/d/e/1FAI… 🏁Please feel free to share this with colleagues, collaborators, and students who may be interested. #ICML #ICML26

English

1

10

55

15.2K

Yi (Joshua) Ren retweetledi

Danica Sutherland@d_j_sutherland·24 Nis

coming back to x, the everything app, to say: Submit work, sign up to review, come to the workshop! This is a great chance to bring together a lot of really cool work that's been happening, but not all as connected (nor as easy to publish) as it should be! testing.ml

Feng Liu@AlexFengLiu1

Excited to share our ICML 2026 Hypothesis Testing Workshop in Seoul, this July! @icmlconf 🎉This workshop aims to bring together researchers developing modern hypothesis testing methodology and applying it to machine learning problems such as robustness, distribution shift, security, medicine, and LLM evaluation. In other words, if you care about how we make ML claims rigorous, this workshop is for you. We now have four confirmed speakers: Arthur Gretton @ArthurGretton, Yao Xie @yaoxie21851119, Bo Li @uiuc_aisecure, and Yisong Yue @yisongyue. The organizing team includes Xiuyuan Cheng (Duke), Feng Liu @AlexFengLiu1, Lester Mackey @LesterMackey, Shayak Sen @shayaksen, Danica J. Sutherland @d_j_sutherland, and Nathaniel Xu (UBC). 📌 Submission deadline: 10 May 2026 📌 Notification: 26 May 2026 📌 Camera-ready: 17 June 2026 📌 Workshop date: July 10 or 11, 2026 (TBA) 🚩Check more information below! 🔗Website: testing.ml 🔗Submission Portal: openreview.net/group?id=ICML.… We’re also recruiting PC members/reviewers. 🔗 Reviewer interest form: docs.google.com/forms/d/e/1FAI… 🏁Please feel free to share this with colleagues, collaborators, and students who may be interested. #ICML #ICML26

English

0

2

3

372

Yi (Joshua) Ren@JoshuaRenyi·2 Nis

@OmarRivasplata Maybe they can change the soundness from 3 to 4 🤣

English

1

0

1

984

Omar Rivasplata@OmarRivasplata·1 Nis

ICML Submission under review received this rebuttal acknowledgement and saying that "Thanks for your response, I maintain my positive score." The positive score maintained is 4: Weak accept 🤷‍♂️

English

11

5

96

38K

Yi (Joshua) Ren@JoshuaRenyi·26 Mar

@cheryyun_l @NeurIPSConf Patpat, just think that Huawei has supported NeurIPS for tens of years.

English

1

0

10

2.1K

Yongyuan Liang@cheryyun_l·25 Mar

Seeing everyone on my timeline declining their NeurIPS AC/Reviewer invites... meanwhile I just checked my inbox and realized they didn't even invite me this year lmao. 4 years of NeurIPS reviewer & author experience for what😆 @NeurIPSConf

English

7

1

131

22.4K

Yi (Joshua) Ren@JoshuaRenyi·6 Mar

@sreejan_kumar Hi Sreejan, very cool work. This reminds me of iterated learning in cognitive science, which applies Bayesian update to predict how agents’ beliefs evolve across generations of interaction. Here is a quite related one: arxiv.org/pdf/2404.04286

English

0

4

638

Sreejan Kumar@sreejan_kumar·5 Mar

In 2022, I won the NeurIPS Outstanding Paper Award. In 2026, I've realized this paper, ahead of its time, accidentally predicted the trajectory of AI development over the past few years. A thread using this to explain how AI has developed 2018->2026:

English

4

25

338

30.1K

Yi (Joshua) Ren@JoshuaRenyi·31 Oca

@IdanShenfeld Very cool work! Just wondering whether SPIN (a very popular self-play method using SFT data, in arxiv.org/abs/2401.01335) can mitigate the forgetting issue or not. It is also an on-policy method.

English

0

100

idan shenfeld@IdanShenfeld·29 Oca

People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)

English

49

221

1.5K

236.9K

Yi (Joshua) Ren retweetledi

Yarin@yaringal·20 Oca

I’m excited to share that we are launching a public safeguards competition next month in partnership with @AISecurityInst, @GraySwanAI, @OATML_Oxford, Sequrity.ai, @OpenAI, @AnthropicAI, and @amazon. This is a red-versus-blue competition focused on building new agent safeguards, and breaking these safeguards. Should be a lot of fun, and there’s prizes as well for open-source submissions! Please help to share this opportunity! Registration is open now: app.grayswan.ai/arena/challeng… --- More details: Oxford (OATML) and UK AISI have teamed up with Gray Swan and Sequrity.ai, as well as OpenAI, Anthropic, and Amazon, to run a public competition where blue teams build defenses against real red teaming attacks, and we'd like to invite you to participate. What is the Safeguards Challenge? Gray Swan runs the Arena, a platform where security researchers ("red teams") attempt to elicit harmful behaviors from AI systems. Challenges have been supported by UK AISI, US CAISI, OpenAI, Anthropic, Amazon, Google DeepMind, and Meta, and have surfaced real vulnerabilities that help developers improve their models. The Safeguards Challenge is the Arena’s first red-versus-blue competition. Instead of just measuring attacks, we're measuring defenses. Blue teams will submit safeguards (system prompts, classifiers, or containerized solutions) that attempt to block red teamers and adversarial inputs while allowing legitimate requests through. Red teams will then try to break those defenses, and the cycle repeats. The target environment Blue teams will defend a multi-agent customer support system with an orchestrator agent, specialized sub-agents, and integrated tools. The system handles realistic customer interactions, and red teams will attempt to trigger harmful behaviors: fraudulent transactions, data exfiltration, unauthorized tool use, and policy-violating responses. Your safeguards will be scored on how well they block attacks from red teamers versus how well they allow benign requests from a holdout test set. The leaderboard uses a combined metric based on false positive and false negative rates. What you can submit * System prompt configurations for monitor models *Input/output classifiers (any framework) *Containerized solutions with custom logic For prize eligibility, solutions must be open source or open weights. Proprietary solutions can compete on a separate unprized leaderboard for benchmarking purposes. Solutions must be registered a week before the first or second defense phase starts and submitted a day beforehand. Submission interface will be available by early February. Timeline for blue teams *January 2026: Preliminary challenge details shared with registered blue teams *February 11-25: Red teams attack baseline defenses and early defense submissions *February 25 - March 25 (First Defense Phase): You receive the attack dataset from Waves 0-1. Build and iterate your safeguards in our test environment. Submit your defense by the end of this phase. *Approximately March 25 - April 1 (Wave 2): Red teams attack your submitted safeguards. You see what breaks. Exact dates TBA. *Approximately April 1 - April 29 (Second Defense Phase): Iterate based on Wave 2 results. Final submissions due before Wave 3. Exact dates TBA. *Approximately April 29 - May 6 (Wave 3): Final attack wave. Leaderboard locks. Exact dates TBA. Prizes $70,000 in prizes for blue teams: *First Defense Phase: $10,000 (top 10 teams, first place $2,000) *Second Defense Phase: $60,000 (top 15 teams, first place $15,000) Blue team entries are per organization. Only open-source/open-weights solutions are prize-eligible. Participants from judging organizations cannot submit; participants from sponsor organizations cannot win prizes. (Other Oxford groups unrelated to OATML are eligible.) Co-sponsors and judges Judging is handled by UK AISI and US CAISI. Why participate? *Test your defenses against real adaptive attacks from skilled red teamers *Benchmark against other research groups and commercial solutions *Contribute to open research on AI safeguards (prize-eligible solutions are published) *Cash prizes for top performers

English

8

25

140

13.6K

Yi (Joshua) Ren@JoshuaRenyi·31 Ara

@mki028 Hey, thanks very much for liking our paper.

English

1

0

1

21

MKI@mki028·30 Ara

Best read in 2025 (well deserved outstanding paper) with clear theoretical proofs & principled way of analysis >Negative gradient+rich gets richer promoted by softmax in DPO->model collapse >Hallucination in SFT is driven by spurious correlation explained by the framework

Yi (Joshua) Ren@JoshuaRenyi

📢Curious why your LLM behaves strangely after long SFT or DPO? We offer a fresh perspective—consider doing a "force analysis" on your model’s behavior. Check out our #ICLR2025 Oral paper: Learning Dynamics of LLM Finetuning! (0/12)

English

1

0

1

115

Yi (Joshua) Ren@JoshuaRenyi·22 Eki

@oh_that_hat @hugo_larochelle Congrats!

English

0

1

59

Hattie Zhou@oh_that_hat·21 Eki

Super proud to share that after a long sabbatical, I have finally defended my PhD 🥳🥹 Growing up, I never thought I’d pursue a PhD, and I complained a lot during it, but looking back now it was a very special and precious experience. Special thanks to @hugo_larochelle @AaronCourville @HanieSedghi @bneyshabur @PreetumNakkiran @jasonyo and many MILA friends for advising me through it all 🌺

English

43

14

425

49.8K

Yi (Joshua) Ren@JoshuaRenyi·20 Eki

@hanqi_xiao @Besteuler Thanks, that's a good point. I guess tracking the change of top1/2/3 might provide some useful information on that. Maybe we should also demonstrate the training curves of some non-RL methods that are good at pass@K. Any suggestions?

English

0

1

37

Hanqi Xiao@hanqi_xiao·20 Eki

@Besteuler I wonder if this relates to/fixes RL not beating pass@k for models without RL at large k!

English

1

0

74

Yi (Joshua) Ren retweetledi

Weiyang Liu@Besteuler·17 Eki

🚀 Glad to introduce SimKO (Simple Pass@K Optimization) Current GRPO-based methods overfit to safe responses -- great Pass@1, poor Pass@K. 🔍 We find this stems from probability over-concentration: the model collapses onto its top-1 token, losing exploration. This appears to be a more accurate observation metric than commonly used entropy. ✨ SimKO fixes this with probability redistribution: ✅ Encoruage top-K candidates for high-entropy tokens in correct responses ❌ Penalize over-confident top-1s for incorrect responses 🧮 Improves Pass@K across math & logic benchmarks -- simple, stable, effective. 📄 Paper: arxiv.org/abs/2510.14807… 🌐 Project: spherelab.ai/simko #LLM #ReinforcementLearning #Reasoning #RLVR #AI

English

5

21

158

10.5K

Yi (Joshua) Ren@JoshuaRenyi·20 Eki

@razdaibi @Besteuler Thanks. It is chosen in a heuristic way, just like other hypers. Figure 8 and Table 3 in the figure did an ablation on that. But in most of the experiments we did, 2~5 is a reasonable choice.

English

1

0

26

Anastasia Razdaibiedina@razdaibi·19 Eki

@Besteuler Very interesting work! How do you choose optimal k? How does choice of k affect the performance?

English

1

0

69

Yi (Joshua) Ren@JoshuaRenyi·24 Tem

@lasha_nlp @mpi_sws_ Congrats!

English

1

0

1

104

Abhilasha Ravichander@lasha_nlp·22 Tem

Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html

English

83

46

593

62.7K

Yi (Joshua) Ren@JoshuaRenyi·29 May

@StellaLisy Super cool work! We also have a similar finding from the perspective of negative rewards. Those 0-reward responses might also contain many correct parts or reasoning steps, and reversing their rewards to positive might make the model do a similar (correct) reasoning.

English

0

2

304

Stella Li@StellaLisy·27 May

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

English

72

344

1.8K

700K

Yi (Joshua) Ren@JoshuaRenyi·17 May

Congrats on your great work, and I learned a lot from it! I am just wondering whether the type-B error correlates with the increase of p[Q1; A2] mentioned in our recent ICLR paper (Learning dynamics of LLM Finetuning), i.e., the model mismatches the facts and questions.

Abhilasha Ravichander@lasha_nlp

We are launching HALoGEN💡, a way to systematically study *when* and *why* LLMs still hallucinate. New work w/ @shrusti_ghela* @davidjwadden @YejinChoinka 💫 🧵 [1/n]

English

1

0

1

709

Yi (Joshua) Ren@JoshuaRenyi·24 Nis

@noamrazin Thanks Noam. Very glad to discuss that with you last year (also added you to the acknowledgment 😀). Yeah it is true a linear model with both + and - parts could cancel the squeezing effect. I am considering using a two gram toy model to fix this. Glad to talk more about it.

English

0

1

92

Noam Razin@noamrazin·24 Nis

Excited that Yi's work received an outstanding paper award! Had the pleasure of discussing with him a few months ago how our likelihood displacement paper (also at ICLR 2025, arxiv.org/abs/2410.08847) relates to theirs. I believe they fill nicely gaps left open by the other.

Yi (Joshua) Ren@JoshuaRenyi

📢Curious why your LLM behaves strangely after long SFT or DPO? We offer a fresh perspective—consider doing a "force analysis" on your model’s behavior. Check out our #ICLR2025 Oral paper: Learning Dynamics of LLM Finetuning! (0/12)

English

2

0

8

683

Yi (Joshua) Ren@JoshuaRenyi·24 Nis

@ShangminGuo Thanks for summarization and efforts in these works. Great pleasure working together.

English

0

1

38

Shangmin Guo@ShangminGuo·24 Nis

A beautiful line of works, all credits to @JoshuaRenyi

English

1

0

1

102

Shangmin Guo@ShangminGuo·24 Nis

Huge congrats to Yi @JoshuaRenyi 🎉🎉🎉 Well-deserved! Yi’s research is really “old-school” (in a very good sense), rigorous, and deep, always a breath of fresh air! Years of collaboration with him have been incredibly rewarding 🫡

ICLR@iclr_conf

Outstanding Papers Safety Alignment Should be Made More Than Just a Few Tokens Deep. Xiangyu Qi, et al. Learning Dynamics of LLM Finetuning. Yi Ren and Danica J. Sutherland. AlphaEdit: Null-Space Constrained Model Editing for Language Models. Junfeng Fang, et al.

English

1

0

2

238

Yi (Joshua) Ren

Keşfet