Changyu Chen

205 posts

Changyu Chen

@Cameron_Chann

PhD student @sgSMU. RL x LLMs. Previously @NTUsg, @ZJU_China

Singapore 🇸🇬 Beigetreten Mayıs 2020

300 Folgt386 Follower

Angehefteter Tweet

Changyu Chen@Cameron_Chann·21 Mar

(1/3) My favorite figure from the paper. Nearly all open-source RL frameworks introduce an unintentional bias when computing the masked mean 😮. The fix? Just replace mask.sum with a constant.

English

178

39.6K

Changyu Chen@Cameron_Chann·3d

A key post-training paradigm shift from @mimo_labs to DeepSeek is the move to multi-teacher on-policy distillation - building the generalist from a diverse pool of 10+ domain experts. Again surprised by their RL infra that supports full-vocabulary OPD with unbounded (??) number of teachers.

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

323

Changyu Chen@Cameron_Chann·4d

super cool work on self-play algos. feels very aligned with @CarinaLHong 's “spiral progression” vision for AI in mathematics from a recent podcast: youtube.com/watch?v=78Vyy_…

YouTube

Luke Bailey@LukeBailey181

Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass @4 of a model 100x bigger.

English

323

Changyu Chen retweetet

Diyi Yang@Diyi_Yang·4d

Many of us are here #ICLR2026 presenting work around human-AI collaboration, evaluation and risks🤩 Come talk to us during poster sessions: @michaelryan207 @StevenyzZhang @ChengleiSi

English

111

15.1K

Changyu Chen retweetet

Amber Liu@JIACHENLIU8·2 Nis

We're living in the BEST era for doing research. 💪 After I graduated from my PhD, the rise of AI-native research gave me a new chance to revisit my research experience. Lately, doing research feels incredibly rewarding to me. I get to experience the pure joy of curiosity-driven science because I no longer have to worry about the lower-level implementations or getting bogged down by infrastructure 🚀 (I'll be sharing some of my own recent research driven by this very soon!) But today, let me introduce the New Orchestra 🎻. We wanted to ship a product that absorbs the friction and brings science back to the curiosity.

English

467

53.2K

Changyu Chen retweetet

Yijia Shao@EchoShao8899·1 Nis

New episode of the AM Podcast (@augmind_fm) is live!📺 In EP3, we are honored to invite Woosuk Kwon (@woosuk_k) to share about LLM inference from a brand new perspective! Woosuk is a co-founder & CTO of @inferact and creator of @vllm_project, who has a lot of experience in this space and also great insights on the next frontier of the AI infra. In this conversation, we cover: - How his early projects shaped his taste for infra work - How vLLM started and what made it take off - How emerging apps are reshaping AI infra - What's next: streaming requests, continual learning with RL, on-device inference, and more This conversation really answered a lot of questions I personally have. Hopefully, it can offer something new to those working on the higher end of user-facing applications as well as the lower end of AI infrastructure!

Augmented Mind Podcast@augmind_fm

"Actually, we (vllm) get more users from the simple UX than vllm performance" For our third guest, we welcome @woosuk_k, co-founder & CTO of @inferact and creator of @vllm_project. To us, Woosuk is a unique guest, and we are amazed by the user-centric perspective on LLM inference he shared — from what makes the vLLM project successful, to new application scenarios to tailor inference to, and to how to support continual learning from user signals, and more. 0:00 - Prelude: Introducing Woosuk and Inferact 3:00 - Woosuk’s First PhD Project 6:00 - How the vLLM Project Got Started 9:18 - AI Infra Needs More Than Just Efficiency 14:08 - How AI Infra and Human-centered AI Are Connected 15:01 - How to Prioritize Feature Requests for Popular AI Infra 18:18 - Streaming Requests and Realtime API 24:05 - Multi-turn, Agentic, Proactive LLMs 27:03 - How to Design AI Infra in a Principled Way 29:13 - How to Design an AI Inference Engine for Continue Learning with RL 35:05 - Would LoRA Training Affect RL Infra Design? 37:28 - Why Start an AI Inference Infra Startup? 40:46 - What Effortless Inference with Open-source Models Means for Developers 43:46 - A Vision for On-device AI Inference 46:19 - Can Today’s Coding Agents Create vLLM?

English

1.8K

Changyu Chen@Cameron_Chann·17 Mar

kudos to the team for the awesome work! as an RLer, I don’t see this as an alternative to RL. Instead, I’m excited about the potential it brings to tackling some core RL challenges.

Phillip Isola@phillip_isola

A few clarifications to common q's about our thickets paper: 1. Is this just ensembling? Seed averaging? Bagging? ... 2. Is this just Qwen? 3. Is it K times slower inference? 4. RL is dead? Post-training is dead?

English

180

Changyu Chen retweetet

Zichen Liu@zzlccc·17 Mar

🦎🦎 Happy to see two of our works (DrGRPO & DPPO) are highlighted here! I don’t think changing a few terms is worth a new branding, so we respectfully kept predecessors’ name while highlighting the correction/improvement on top of them. Hopefully they inspire RL algo designs.

Alex Weers@a_weers

Finally finished! If you're interested in an overview of recent methods in reinforcement learning for reasoning LLMs, check out this blog post: aweers.de/blog/2026/rl-f… It summarizes ten methods, tries to highlight differences and trends, and has a collection of open problems

English

3.8K

Changyu Chen retweetet

Diyi Yang@Diyi_Yang·11 Mar

🚨Postdoc opening: We are looking for a postdoc researcher with expertise in NLP, RL, and/or ML to develop AI-powered clinical support tools for mental health counseling in the Global South. Working with @EmmaBrunskill & @Diyi_Yang at Stanford. Apply by April 15, 2026 via tinyurl.com/ai4mentalhealt… 🧵👇

English

280

46.8K

Changyu Chen retweetet

CLS ✈️ ICLR'26@ChengleiSi·9 Mar

Great to see autoresearch blowing up becoz of the legendary Karpathy sensei. This year will ofc be an exciting year for automated AI research. For all of you guys excited to jump onto it, hopefully our papers will be some helpful references: - automated feedback loop for research agents to optimize LLM pre-training and post-training stacks: x.com/ChengleiSi/sta… - generating novel research ideas with LLMs, along with a comparison against human experts: x.com/ChengleiSi/sta… - evaluating the effectiveness of LLM-generated ideas through experiment execution: x.com/ChengleiSi/sta… - finetuning LLMs to directly predict the effectiveness of research ideas: x.com/jiaxinwen22/st…

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

344

50.3K

Changyu Chen@Cameron_Chann·10 Mar

@eliebakouch thank you for the amazing work! all the best elie!

English

elie@eliebakouch·9 Mar

today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)

English

116

746

33.1K

Changyu Chen retweetet

Min Lin@mavenlin·10 Mar

The most exciting breakthroughs in intelligence are yet to come. I’m super excited to start this journey with mes amis to make them happen together.

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

406

53.6K

Changyu Chen@Cameron_Chann·4 Mar

@JustinLin610 Thank you for everything to open model world and all the best junyang

English

631

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

730

13.6K

6.6M

Changyu Chen@Cameron_Chann·3 Mar

@NiJinjie @GoogleDeepMind @YiTayML @quocleix huge congrats!!

English

175

Jinjie Ni@NiJinjie·2 Mar

Life update: I’ve joined @GoogleDeepMind as a research scientist to work on ✨gemini scaling and RL, under the leadership of Yi Tay (@YiTayML) and Quoc Le (@quocleix). I feel extremely fortunate to be on the critical path towards AGI and can't wait to help push the frontier of gemini capabilities! 🚀

English

1.2K

90.4K

Changyu Chen retweetet

Diyi Yang@Diyi_Yang·10 Şub

Two amazing postdocs from our lab are on the academic job market this year. I've learned a lot from their wonderful research -- you should definitely reach out and hire them!

English

141

41.6K

Changyu Chen@Cameron_Chann·9 Şub

@zzlccc @GoogleDeepMind @YiTayML @quocleix congrats!!

English

330

Zichen Liu@zzlccc·9 Şub

Thrilled to share that I’ve joined @GoogleDeepMind to work on Gemini post-training! I feel incredibly fortunate to be cooking on this sunny island under @YiTayML's leadership, within @quocleix's broader organization. Looking forward to enjoying RL research and pushing the frontiers of Gemini alongside such a brilliant team!

English

279

44.8K

Changyu Chen retweetet

Hao Zhu@_Hao_Zhu·28 Oca

Introducing the curse of coordination. Agents perform 50% worse in teams than working alone. People building human-AI collaboration today don't realize why current LLMs fail to be good teammates. We built CooperBench to study this. For humans, we recognize that teamwork isn't just the sum of individual capability. Communication and coordination often outweigh raw skill. But for AI? We're only hill-climbing benchmarks that evaluate solo technical abilities. CooperBench A benchmark to evaluate agent cooperation in realistic software teamwork tasks. The setup is intuitive: two agents, two tasks, two VMs, one chat channel (agents can send over arbitrary text, even the entire patch they wrote). We evaluate whether the merged solution from both agents passes the requirements of both tasks. The curse of coordination The most striking result: agents perform 50% worse in teams (black line) than working alone (blue line). Why is this happening? Is it because they can't use the communication tool? No. They spent 20% of their time sending messages. The problem? Those messages were repetitive, vague, ignored questions, or straight-up hallucinated. But bad communication is only part of the story. We found two deeper failures: Commitment: Agents don't do what they promised. Expectations: Agents don't expect others to keep promises either. Without these, cooperation collapses. However, there is a silver lining We also find emergent coordination behaviors, e.g. role division, resource division, and negotiation, which gives us hope that we can use reinforcement learning to improve coordination. What's next? It is true that highly-engineered multi-agent orchestration could largely sidestep the coordination problem. However, we care more about the AI's capability: if we truly want AI to be our teammates, we need them to be natively capable of effective communicating and coordinating. Two agents on software tasks is just the beginning. The real goal: agents that can cooperate with us well enough to actually empower us. CooperBench is our first step. If you're working on this too, let's talk.

English

235

80.3K

Changyu Chen retweetet

Jason Weston@jaseweston·22 Oca

Our team in FAIR at Meta is hiring a (full-time) researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM) for self-improvement & co-improvement. Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927

English

354

57.2K

Changyu Chen retweetet

Yijia Shao@EchoShao8899·21 Oca

To kick off the new year, I am super excited to launch The Augmented Mind Podcast (@augmind_fm) with @shannonzshen and @michaelryan207 to share technical human-centered AI work! 🎙️ Since I started my PhD working on human-agent collaboration, I've always noticed this missing channel: - There are many channels sharing AI work, but you seldom see human-centered AI work there. - There are papers out there, but many careful thoughts are buried and we just have too many papers these days. - There are some interviews, but they often focus more on high-level visions. Those careful designs or technical innovations that align AI with human needs, values, etc. remain unseen. In The AM Podcast, we plan to share compelling research, infrastructure, and systems through long-form monthly episodes. We want to show examples of how to develop AI that collaborates and augments rather than purely automates or replaces. In EP0, we share who we are, why we started the podcast, and what we're looking forward to. Our first episode will drop this week!

Augmented Mind Podcast@augmind_fm

AI used to be a distant promise; now it permeates our lives. AI is getting better, but is it making us better? We are promised that AI will augment our minds, but how? We--@EchoShao8899, @shannonzshen, and @michaelryan207--are excited to launch the Augmented Mind Podcast (The AM Podcast), a podcast about technical human-centered AI work. We'll share compelling research, infrastructure, and systems through monthly episodes, featuring interviews with the pioneering minds behind them. We release EP0 today to share who we are, why we started this podcast, and what we're looking forward to. 0:00 - Prelude: the problems we care about 1:48 - Host introduction 2:03 - Why we started the AM Podcast 2:31 - Hot takes on human-centered AI 10:45 - Format of our podcast 11:28 - Unique technical challenges in human-centered AI 16:45 - Let the journey begin!

English

12.8K

Changyu Chen@Cameron_Chann·12 Ara

skills as the new tooling for professional tasks 👇

jian@jianxliao

GPT-5.2 new spreadsheet and presentation capabilities are just skills. And I extracted of them in a github repo. Here’s what’s in `/home/oai/skills`: - docs/ - pdfs/ - spreadsheets/ Skills: docs/render_docx.py docs/skill.md pdfs/skill.md spreadsheets/artifact_tool_spreadsheet_formulas.md spreadsheets/artifact_tool_spreadsheets_api.md spreadsheets/skill.md spreadsheets/spreadsheet.md --- I am skill-pilled. Don't build agents, build skills.

English

426

Changyu Chen@Cameron_Chann·8 Ara

@gy910210 @SEAWorkshop @zzlccc @anyaasims @Keyu_Duan @simon_ycl @Benjamin_eecs @_Hao_Zhu @shi_weiyan @Diyi_Yang @WeeSunLee @mavenlin Thank you Yu!

English

Yu Gong@gy910210·8 Ara

@Cameron_Chann @SEAWorkshop @zzlccc @anyaasims @Keyu_Duan @simon_ycl @Benjamin_eecs @_Hao_Zhu @shi_weiyan @Diyi_Yang @WeeSunLee @mavenlin Congrats!!

English

105

Changyu Chen@Cameron_Chann·8 Ara

🏆 Excited to share GEM received the Outstanding Paper Award @SEAWorkshop of #NeurIPS2025 . What a great way to wrap up this amazing neurips journey! Huge thanks to the workshop committee and organizers for the recognition. Grateful for our incredible collaborators and advisors who made this project possible. Thanks to everyone involved! 🎉

SEA Workshop@SEAWorkshop

Congrats to the following paper authors attaining Outstanding Paper Awards at @SEAWorkshop! GEM: A Gym for Agentic LLMs Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee Sun Lee, Min Lin

English

9.1K

Entdecken

@mimo_labs @CarinaLHong @michaelryan207 @StevenyzZhang @ChengleiSi @augmind_fm @woosuk_k @inferact