Yi (Joshua) Ren

68 posts

Yi (Joshua) Ren

Yi (Joshua) Ren

@JoshuaRenyi

Postdoc @OATML_Oxford, Prev. Ph.D @UBC_CS, MSc @EdinburghNLP. Working on ML (learning dynamics, continual learning, iterated learning, LLM)

Oxford, UK Katılım Ekim 2021
196 Takip Edilen361 Takipçiler
Sabitlenmiş Tweet
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
📢Curious why your LLM behaves strangely after long SFT or DPO? We offer a fresh perspective—consider doing a "force analysis" on your model’s behavior. Check out our #ICLR2025 Oral paper: Learning Dynamics of LLM Finetuning! (0/12)
Yi (Joshua) Ren tweet media
English
6
116
798
87.6K
DailyPapers
DailyPapers@HuggingPapers·
Fine-tuning increases hallucinations New research shows SFT causes factual errors by interfering with pre-trained knowledge. The authors propose self-distillation to learn new facts without forgetting, plus selective parameter freezing to reduce hallucinations while preserving performance.
DailyPapers tweet media
English
4
35
165
9.3K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@learning_mech Hey Jamie, thanks for the great work! These physics-style machine learning theoretical directions are really cool!
English
0
0
1
43
Jamie Simon
Jamie Simon@learning_mech·
1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 arxiv.org/pdf/2604.21691 🔧
Jamie Simon tweet media
English
52
290
1.5K
291.4K
Yi (Joshua) Ren retweetledi
Feng Liu
Feng Liu@AlexFengLiu1·
Excited to share our ICML 2026 Hypothesis Testing Workshop in Seoul, this July! @icmlconf 🎉This workshop aims to bring together researchers developing modern hypothesis testing methodology and applying it to machine learning problems such as robustness, distribution shift, security, medicine, and LLM evaluation. In other words, if you care about how we make ML claims rigorous, this workshop is for you. We now have four confirmed speakers: Arthur Gretton @ArthurGretton, Yao Xie @yaoxie21851119, Bo Li @uiuc_aisecure, and Yisong Yue @yisongyue. The organizing team includes Xiuyuan Cheng (Duke), Feng Liu @AlexFengLiu1, Lester Mackey @LesterMackey, Shayak Sen @shayaksen, Danica J. Sutherland @d_j_sutherland, and Nathaniel Xu (UBC). 📌 Submission deadline: 10 May 2026 📌 Notification: 26 May 2026 📌 Camera-ready: 17 June 2026 📌 Workshop date: July 10 or 11, 2026 (TBA) 🚩Check more information below! 🔗Website: testing.ml 🔗Submission Portal: openreview.net/group?id=ICML.… We’re also recruiting PC members/reviewers. 🔗 Reviewer interest form: docs.google.com/forms/d/e/1FAI… 🏁Please feel free to share this with colleagues, collaborators, and students who may be interested. #ICML #ICML26
Feng Liu tweet media
English
1
10
55
15.2K
Yi (Joshua) Ren retweetledi
Danica Sutherland
Danica Sutherland@d_j_sutherland·
coming back to x, the everything app, to say: Submit work, sign up to review, come to the workshop! This is a great chance to bring together a lot of really cool work that's been happening, but not all as connected (nor as easy to publish) as it should be! testing.ml
Feng Liu@AlexFengLiu1

Excited to share our ICML 2026 Hypothesis Testing Workshop in Seoul, this July! @icmlconf 🎉This workshop aims to bring together researchers developing modern hypothesis testing methodology and applying it to machine learning problems such as robustness, distribution shift, security, medicine, and LLM evaluation. In other words, if you care about how we make ML claims rigorous, this workshop is for you. We now have four confirmed speakers: Arthur Gretton @ArthurGretton, Yao Xie @yaoxie21851119, Bo Li @uiuc_aisecure, and Yisong Yue @yisongyue. The organizing team includes Xiuyuan Cheng (Duke), Feng Liu @AlexFengLiu1, Lester Mackey @LesterMackey, Shayak Sen @shayaksen, Danica J. Sutherland @d_j_sutherland, and Nathaniel Xu (UBC). 📌 Submission deadline: 10 May 2026 📌 Notification: 26 May 2026 📌 Camera-ready: 17 June 2026 📌 Workshop date: July 10 or 11, 2026 (TBA) 🚩Check more information below! 🔗Website: testing.ml 🔗Submission Portal: openreview.net/group?id=ICML.… We’re also recruiting PC members/reviewers. 🔗 Reviewer interest form: docs.google.com/forms/d/e/1FAI… 🏁Please feel free to share this with colleagues, collaborators, and students who may be interested. #ICML #ICML26

English
0
2
3
372
Omar Rivasplata
Omar Rivasplata@OmarRivasplata·
ICML Submission under review received this rebuttal acknowledgement and saying that "Thanks for your response, I maintain my positive score." The positive score maintained is 4: Weak accept 🤷‍♂️
Omar Rivasplata tweet media
English
11
5
96
38K
Yongyuan Liang
Yongyuan Liang@cheryyun_l·
Seeing everyone on my timeline declining their NeurIPS AC/Reviewer invites... meanwhile I just checked my inbox and realized they didn't even invite me this year lmao. 4 years of NeurIPS reviewer & author experience for what😆 @NeurIPSConf
English
7
1
131
22.4K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@sreejan_kumar Hi Sreejan, very cool work. This reminds me of iterated learning in cognitive science, which applies Bayesian update to predict how agents’ beliefs evolve across generations of interaction. Here is a quite related one: arxiv.org/pdf/2404.04286
English
0
0
4
638
Sreejan Kumar
Sreejan Kumar@sreejan_kumar·
In 2022, I won the NeurIPS Outstanding Paper Award. In 2026, I've realized this paper, ahead of its time, accidentally predicted the trajectory of AI development over the past few years. A thread using this to explain how AI has developed 2018->2026:
Sreejan Kumar tweet media
English
4
25
338
30.1K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@IdanShenfeld Very cool work! Just wondering whether SPIN (a very popular self-play method using SFT data, in arxiv.org/abs/2401.01335) can mitigate the forgetting issue or not. It is also an on-policy method.
English
0
0
0
100
idan shenfeld
idan shenfeld@IdanShenfeld·
People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)
idan shenfeld tweet media
English
49
221
1.5K
236.9K
Yi (Joshua) Ren retweetledi
Yarin
Yarin@yaringal·
I’m excited to share that we are launching a public safeguards competition next month in partnership with @AISecurityInst, @GraySwanAI, @OATML_Oxford, Sequrity.ai, @OpenAI, @AnthropicAI, and @amazon. This is a red-versus-blue competition focused on building new agent safeguards, and breaking these safeguards. Should be a lot of fun, and there’s prizes as well for open-source submissions! Please help to share this opportunity! Registration is open now: app.grayswan.ai/arena/challeng… --- More details: Oxford (OATML) and UK AISI have teamed up with Gray Swan and Sequrity.ai, as well as OpenAI, Anthropic, and Amazon, to run a public competition where blue teams build defenses against real red teaming attacks, and we'd like to invite you to participate. What is the Safeguards Challenge? Gray Swan runs the Arena, a platform where security researchers ("red teams") attempt to elicit harmful behaviors from AI systems. Challenges have been supported by UK AISI, US CAISI, OpenAI, Anthropic, Amazon, Google DeepMind, and Meta, and have surfaced real vulnerabilities that help developers improve their models. The Safeguards Challenge is the Arena’s first red-versus-blue competition. Instead of just measuring attacks, we're measuring defenses. Blue teams will submit safeguards (system prompts, classifiers, or containerized solutions) that attempt to block red teamers and adversarial inputs while allowing legitimate requests through. Red teams will then try to break those defenses, and the cycle repeats. The target environment Blue teams will defend a multi-agent customer support system with an orchestrator agent, specialized sub-agents, and integrated tools. The system handles realistic customer interactions, and red teams will attempt to trigger harmful behaviors: fraudulent transactions, data exfiltration, unauthorized tool use, and policy-violating responses. Your safeguards will be scored on how well they block attacks from red teamers versus how well they allow benign requests from a holdout test set. The leaderboard uses a combined metric based on false positive and false negative rates. What you can submit * System prompt configurations for monitor models *Input/output classifiers (any framework) *Containerized solutions with custom logic For prize eligibility, solutions must be open source or open weights. Proprietary solutions can compete on a separate unprized leaderboard for benchmarking purposes. Solutions must be registered a week before the first or second defense phase starts and submitted a day beforehand. Submission interface will be available by early February. Timeline for blue teams *January 2026: Preliminary challenge details shared with registered blue teams *February 11-25: Red teams attack baseline defenses and early defense submissions *February 25 - March 25 (First Defense Phase): You receive the attack dataset from Waves 0-1. Build and iterate your safeguards in our test environment. Submit your defense by the end of this phase. *Approximately March 25 - April 1 (Wave 2): Red teams attack your submitted safeguards. You see what breaks. Exact dates TBA. *Approximately April 1 - April 29 (Second Defense Phase): Iterate based on Wave 2 results. Final submissions due before Wave 3. Exact dates TBA. *Approximately April 29 - May 6 (Wave 3): Final attack wave. Leaderboard locks. Exact dates TBA. Prizes $70,000 in prizes for blue teams: *First Defense Phase: $10,000 (top 10 teams, first place $2,000) *Second Defense Phase: $60,000 (top 15 teams, first place $15,000) Blue team entries are per organization. Only open-source/open-weights solutions are prize-eligible. Participants from judging organizations cannot submit; participants from sponsor organizations cannot win prizes. (Other Oxford groups unrelated to OATML are eligible.) Co-sponsors and judges Judging is handled by UK AISI and US CAISI. Why participate? *Test your defenses against real adaptive attacks from skilled red teamers *Benchmark against other research groups and commercial solutions *Contribute to open research on AI safeguards (prize-eligible solutions are published) *Cash prizes for top performers
English
8
25
140
13.6K
MKI
MKI@mki028·
Best read in 2025 (well deserved outstanding paper) with clear theoretical proofs & principled way of analysis >Negative gradient+rich gets richer promoted by softmax in DPO->model collapse >Hallucination in SFT is driven by spurious correlation explained by the framework
Yi (Joshua) Ren@JoshuaRenyi

📢Curious why your LLM behaves strangely after long SFT or DPO? We offer a fresh perspective—consider doing a "force analysis" on your model’s behavior. Check out our #ICLR2025 Oral paper: Learning Dynamics of LLM Finetuning! (0/12)

English
1
0
1
115
Hattie Zhou
Hattie Zhou@oh_that_hat·
Super proud to share that after a long sabbatical, I have finally defended my PhD 🥳🥹 Growing up, I never thought I’d pursue a PhD, and I complained a lot during it, but looking back now it was a very special and precious experience. Special thanks to @hugo_larochelle @AaronCourville @HanieSedghi @bneyshabur @PreetumNakkiran @jasonyo and many MILA friends for advising me through it all 🌺
Hattie Zhou tweet media
English
43
14
425
49.8K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@hanqi_xiao @Besteuler Thanks, that's a good point. I guess tracking the change of top1/2/3 might provide some useful information on that. Maybe we should also demonstrate the training curves of some non-RL methods that are good at pass@K. Any suggestions?
English
0
0
1
37
Hanqi Xiao
Hanqi Xiao@hanqi_xiao·
@Besteuler I wonder if this relates to/fixes RL not beating pass@k for models without RL at large k!
English
1
0
0
74
Yi (Joshua) Ren retweetledi
Weiyang Liu
Weiyang Liu@Besteuler·
🚀 Glad to introduce SimKO (Simple Pass@K Optimization) Current GRPO-based methods overfit to safe responses -- great Pass@1, poor Pass@K. 🔍 We find this stems from probability over-concentration: the model collapses onto its top-1 token, losing exploration. This appears to be a more accurate observation metric than commonly used entropy. ✨ SimKO fixes this with probability redistribution: ✅ Encoruage top-K candidates for high-entropy tokens in correct responses ❌ Penalize over-confident top-1s for incorrect responses 🧮 Improves Pass@K across math & logic benchmarks -- simple, stable, effective. 📄 Paper: arxiv.org/abs/2510.14807… 🌐 Project: spherelab.ai/simko #LLM #ReinforcementLearning #Reasoning #RLVR #AI
Weiyang Liu tweet media
English
5
21
158
10.5K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@razdaibi @Besteuler Thanks. It is chosen in a heuristic way, just like other hypers. Figure 8 and Table 3 in the figure did an ablation on that. But in most of the experiments we did, 2~5 is a reasonable choice.
English
1
0
0
26
Abhilasha Ravichander
Abhilasha Ravichander@lasha_nlp·
Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html
Abhilasha Ravichander tweet media
English
83
46
593
62.7K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@StellaLisy Super cool work! We also have a similar finding from the perspective of negative rewards. Those 0-reward responses might also contain many correct parts or reasoning steps, and reversing their rewards to positive might make the model do a similar (correct) reasoning.
Yi (Joshua) Ren tweet media
English
0
0
2
304
Stella Li
Stella Li@StellaLisy·
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
Stella Li tweet media
English
72
344
1.8K
700K
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
Congrats on your great work, and I learned a lot from it! I am just wondering whether the type-B error correlates with the increase of p[Q1; A2] mentioned in our recent ICLR paper (Learning dynamics of LLM Finetuning), i.e., the model mismatches the facts and questions.
Abhilasha Ravichander@lasha_nlp

We are launching HALoGEN💡, a way to systematically study *when* and *why* LLMs still hallucinate. New work w/ @shrusti_ghela* @davidjwadden @YejinChoinka 💫 🧵 [1/n]

English
1
0
1
709
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@noamrazin Thanks Noam. Very glad to discuss that with you last year (also added you to the acknowledgment 😀). Yeah it is true a linear model with both + and - parts could cancel the squeezing effect. I am considering using a two gram toy model to fix this. Glad to talk more about it.
English
0
0
1
92
Noam Razin
Noam Razin@noamrazin·
Excited that Yi's work received an outstanding paper award! Had the pleasure of discussing with him a few months ago how our likelihood displacement paper (also at ICLR 2025, arxiv.org/abs/2410.08847) relates to theirs. I believe they fill nicely gaps left open by the other.
Yi (Joshua) Ren@JoshuaRenyi

📢Curious why your LLM behaves strangely after long SFT or DPO? We offer a fresh perspective—consider doing a "force analysis" on your model’s behavior. Check out our #ICLR2025 Oral paper: Learning Dynamics of LLM Finetuning! (0/12)

English
2
0
8
683
Yi (Joshua) Ren
Yi (Joshua) Ren@JoshuaRenyi·
@ShangminGuo Thanks for summarization and efforts in these works. Great pleasure working together.
English
0
0
1
38
Shangmin Guo
Shangmin Guo@ShangminGuo·
Huge congrats to Yi @JoshuaRenyi 🎉🎉🎉 Well-deserved! Yi’s research is really “old-school” (in a very good sense), rigorous, and deep, always a breath of fresh air! Years of collaboration with him have been incredibly rewarding 🫡
ICLR@iclr_conf

Outstanding Papers Safety Alignment Should be Made More Than Just a Few Tokens Deep. Xiangyu Qi, et al. Learning Dynamics of LLM Finetuning. Yi Ren and Danica J. Sutherland. AlphaEdit: Null-Space Constrained Model Editing for Language Models. Junfeng Fang, et al.

English
1
0
2
238