Changyu Chen

205 posts

Changyu Chen banner
Changyu Chen

Changyu Chen

@Cameron_Chann

PhD student @sgSMU. RL x LLMs. Previously @NTUsg, @ZJU_China

Singapore 🇸🇬 参加日 Mayıs 2020
300 フォロー中386 フォロワー
固定されたツイート
Changyu Chen
Changyu Chen@Cameron_Chann·
(1/3) My favorite figure from the paper. Nearly all open-source RL frameworks introduce an unintentional bias when computing the masked mean 😮. The fix? Just replace mask.sum with a constant.
Changyu Chen tweet media
English
3
21
178
39.6K
Changyu Chen
Changyu Chen@Cameron_Chann·
A key post-training paradigm shift from @mimo_labs to DeepSeek is the move to multi-teacher on-policy distillation - building the generalist from a diverse pool of 10+ domain experts. Again surprised by their RL infra that supports full-vocabulary OPD with unbounded (??) number of teachers.
Changyu Chen tweet mediaChangyu Chen tweet media
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
1
0
6
323
Changyu Chen がリツイート
Amber Liu
Amber Liu@JIACHENLIU8·
We're living in the BEST era for doing research. 💪 After I graduated from my PhD, the rise of AI-native research gave me a new chance to revisit my research experience. Lately, doing research feels incredibly rewarding to me. I get to experience the pure joy of curiosity-driven science because I no longer have to worry about the lower-level implementations or getting bogged down by infrastructure 🚀 (I'll be sharing some of my own recent research driven by this very soon!) But today, let me introduce the New Orchestra 🎻. We wanted to ship a product that absorbs the friction and brings science back to the curiosity.
English
26
59
467
53.2K
Changyu Chen がリツイート
Yijia Shao
Yijia Shao@EchoShao8899·
New episode of the AM Podcast (@augmind_fm) is live!📺 In EP3, we are honored to invite Woosuk Kwon (@woosuk_k) to share about LLM inference from a brand new perspective! Woosuk is a co-founder & CTO of @inferact and creator of @vllm_project, who has a lot of experience in this space and also great insights on the next frontier of the AI infra. In this conversation, we cover: - How his early projects shaped his taste for infra work - How vLLM started and what made it take off - How emerging apps are reshaping AI infra - What's next: streaming requests, continual learning with RL, on-device inference, and more This conversation really answered a lot of questions I personally have. Hopefully, it can offer something new to those working on the higher end of user-facing applications as well as the lower end of AI infrastructure!
Augmented Mind Podcast@augmind_fm

"Actually, we (vllm) get more users from the simple UX than vllm performance" For our third guest, we welcome @woosuk_k, co-founder & CTO of @inferact and creator of @vllm_project. To us, Woosuk is a unique guest, and we are amazed by the user-centric perspective on LLM inference he shared — from what makes the vLLM project successful, to new application scenarios to tailor inference to, and to how to support continual learning from user signals, and more. 0:00 - Prelude: Introducing Woosuk and Inferact 3:00 - Woosuk’s First PhD Project 6:00 - How the vLLM Project Got Started 9:18 - AI Infra Needs More Than Just Efficiency 14:08 - How AI Infra and Human-centered AI Are Connected 15:01 - How to Prioritize Feature Requests for Popular AI Infra 18:18 - Streaming Requests and Realtime API 24:05 - Multi-turn, Agentic, Proactive LLMs 27:03 - How to Design AI Infra in a Principled Way 29:13 - How to Design an AI Inference Engine for Continue Learning with RL 35:05 - Would LoRA Training Affect RL Infra Design? 37:28 - Why Start an AI Inference Infra Startup? 40:46 - What Effortless Inference with Open-source Models Means for Developers 43:46 - A Vision for On-device AI Inference 46:19 - Can Today’s Coding Agents Create vLLM?

English
2
6
22
1.8K
Changyu Chen がリツイート
Zichen Liu
Zichen Liu@zzlccc·
🦎🦎 Happy to see two of our works (DrGRPO & DPPO) are highlighted here! I don’t think changing a few terms is worth a new branding, so we respectfully kept predecessors’ name while highlighting the correction/improvement on top of them. Hopefully they inspire RL algo designs.
Alex Weers@a_weers

Finally finished! If you're interested in an overview of recent methods in reinforcement learning for reasoning LLMs, check out this blog post: aweers.de/blog/2026/rl-f… It summarizes ten methods, tries to highlight differences and trends, and has a collection of open problems

English
2
4
40
3.8K
Changyu Chen がリツイート
Diyi Yang
Diyi Yang@Diyi_Yang·
🚨Postdoc opening: We are looking for a postdoc researcher with expertise in NLP, RL, and/or ML to develop AI-powered clinical support tools for mental health counseling in the Global South. Working with @EmmaBrunskill & @Diyi_Yang at Stanford. Apply by April 15, 2026 via tinyurl.com/ai4mentalhealt… 🧵👇
English
13
67
280
46.8K
Changyu Chen がリツイート
CLS ✈️ ICLR'26
CLS ✈️ ICLR'26@ChengleiSi·
Great to see autoresearch blowing up becoz of the legendary Karpathy sensei. This year will ofc be an exciting year for automated AI research. For all of you guys excited to jump onto it, hopefully our papers will be some helpful references: - automated feedback loop for research agents to optimize LLM pre-training and post-training stacks: x.com/ChengleiSi/sta… - generating novel research ideas with LLMs, along with a comparison against human experts: x.com/ChengleiSi/sta… - evaluating the effectiveness of LLM-generated ideas through experiment execution: x.com/ChengleiSi/sta… - finetuning LLMs to directly predict the effectiveness of research ideas: x.com/jiaxinwen22/st…
Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English
9
27
344
50.3K
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
746
33.1K
Changyu Chen がリツイート
Changyu Chen
Changyu Chen@Cameron_Chann·
@JustinLin610 Thank you for everything to open model world and all the best junyang
English
0
0
1
631
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
730
13.6K
6.6M
Jinjie Ni
Jinjie Ni@NiJinjie·
Life update: I’ve joined @GoogleDeepMind as a research scientist to work on ✨gemini scaling and RL, under the leadership of Yi Tay (@YiTayML) and Quoc Le (@quocleix). I feel extremely fortunate to be on the critical path towards AGI and can't wait to help push the frontier of gemini capabilities! 🚀
Jinjie Ni tweet media
English
66
26
1.2K
90.4K
Changyu Chen がリツイート
Diyi Yang
Diyi Yang@Diyi_Yang·
Two amazing postdocs from our lab are on the academic job market this year. I've learned a lot from their wonderful research -- you should definitely reach out and hire them!
English
2
30
142
41.6K
Zichen Liu
Zichen Liu@zzlccc·
Thrilled to share that I’ve joined @GoogleDeepMind to work on Gemini post-training! I feel incredibly fortunate to be cooking on this sunny island under @YiTayML's leadership, within @quocleix's broader organization. Looking forward to enjoying RL research and pushing the frontiers of Gemini alongside such a brilliant team!
Zichen Liu tweet media
English
47
8
279
44.8K
Changyu Chen がリツイート
Hao Zhu
Hao Zhu@_Hao_Zhu·
Introducing the curse of coordination. Agents perform 50% worse in teams than working alone. People building human-AI collaboration today don't realize why current LLMs fail to be good teammates. We built CooperBench to study this. For humans, we recognize that teamwork isn't just the sum of individual capability. Communication and coordination often outweigh raw skill. But for AI? We're only hill-climbing benchmarks that evaluate solo technical abilities. CooperBench A benchmark to evaluate agent cooperation in realistic software teamwork tasks. The setup is intuitive: two agents, two tasks, two VMs, one chat channel (agents can send over arbitrary text, even the entire patch they wrote). We evaluate whether the merged solution from both agents passes the requirements of both tasks. The curse of coordination The most striking result: agents perform 50% worse in teams (black line) than working alone (blue line). Why is this happening? Is it because they can't use the communication tool? No. They spent 20% of their time sending messages. The problem? Those messages were repetitive, vague, ignored questions, or straight-up hallucinated. But bad communication is only part of the story. We found two deeper failures: Commitment: Agents don't do what they promised. Expectations: Agents don't expect others to keep promises either. Without these, cooperation collapses. However, there is a silver lining We also find emergent coordination behaviors, e.g. role division, resource division, and negotiation, which gives us hope that we can use reinforcement learning to improve coordination. What's next? It is true that highly-engineered multi-agent orchestration could largely sidestep the coordination problem. However, we care more about the AI's capability: if we truly want AI to be our teammates, we need them to be natively capable of effective communicating and coordinating. Two agents on software tasks is just the beginning. The real goal: agents that can cooperate with us well enough to actually empower us. CooperBench is our first step. If you're working on this too, let's talk.
Hao Zhu tweet mediaHao Zhu tweet media
English
11
58
235
80.3K
Changyu Chen がリツイート
Jason Weston
Jason Weston@jaseweston·
Our team in FAIR at Meta is hiring a (full-time) researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM) for self-improvement & co-improvement. Apply here: metacareers.com/profile/job_de… Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): arxiv.org/abs/2512.05356 SPICE (Self-Play in Corpus Environments): arxiv.org/abs/2510.24684 Self-Challenging Agents: arxiv.org/abs/2506.01716 RL from Human Interaction: arxiv.org/abs/2509.25137 AggLM (parallel aggregation): arxiv.org/abs/2509.06870 StepWiser (CoT-PRM RL): arxiv.org/abs/2508.19229 DARLING (diversity-trained RL): arxiv.org/abs/2509.02534 J1 (RL-trained LLM-as-Judge): arxiv.org/abs/2505.10320 CoT-Self-Instruct: arxiv.org/abs/2507.23751 Multi-Token Attention: arxiv.org/abs/2504.00927
English
10
33
354
57.2K
Changyu Chen がリツイート
Yijia Shao
Yijia Shao@EchoShao8899·
To kick off the new year, I am super excited to launch The Augmented Mind Podcast (@augmind_fm) with @shannonzshen and @michaelryan207 to share technical human-centered AI work! 🎙️ Since I started my PhD working on human-agent collaboration, I've always noticed this missing channel: - There are many channels sharing AI work, but you seldom see human-centered AI work there. - There are papers out there, but many careful thoughts are buried and we just have too many papers these days. - There are some interviews, but they often focus more on high-level visions. Those careful designs or technical innovations that align AI with human needs, values, etc. remain unseen. In The AM Podcast, we plan to share compelling research, infrastructure, and systems through long-form monthly episodes. We want to show examples of how to develop AI that collaborates and augments rather than purely automates or replaces. In EP0, we share who we are, why we started the podcast, and what we're looking forward to. Our first episode will drop this week!
Augmented Mind Podcast@augmind_fm

AI used to be a distant promise; now it permeates our lives. AI is getting better, but is it making us better? We are promised that AI will augment our minds, but how? We--@EchoShao8899, @shannonzshen, and @michaelryan207--are excited to launch the Augmented Mind Podcast (The AM Podcast), a podcast about technical human-centered AI work. We'll share compelling research, infrastructure, and systems through monthly episodes, featuring interviews with the pioneering minds behind them. We release EP0 today to share who we are, why we started this podcast, and what we're looking forward to. 0:00 - Prelude: the problems we care about 1:48 - Host introduction 2:03 - Why we started the AM Podcast 2:31 - Hot takes on human-centered AI 10:45 - Format of our podcast 11:28 - Unique technical challenges in human-centered AI 16:45 - Let the journey begin!

English
2
17
77
12.8K
Changyu Chen
Changyu Chen@Cameron_Chann·
🏆 Excited to share GEM received the Outstanding Paper Award @SEAWorkshop of #NeurIPS2025 . What a great way to wrap up this amazing neurips journey! Huge thanks to the workshop committee and organizers for the recognition. Grateful for our incredible collaborators and advisors who made this project possible. Thanks to everyone involved! 🎉
Changyu Chen tweet mediaChangyu Chen tweet media
SEA Workshop@SEAWorkshop

Congrats to the following paper authors attaining Outstanding Paper Awards at @SEAWorkshop! GEM: A Gym for Agentic LLMs Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee Sun Lee, Min Lin

English
7
13
66
9.1K