Tong Chen

179 posts

Tong Chen

@tomchen0

PhD student @uwcse @uwnlp

Katılım Şubat 2023

583 Takip Edilen835 Takipçiler

Sabitlenmiş Tweet

Tong Chen@tomchen0·13 Kas

OpenAI's blog (openai.com/index/why-lang…) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with our Binary Retrieval-Augmented Reward (RAR) can improve factuality (40% reduction in hallucination) while preserving model utility (win rate and accuracy) of fully trained, capable LMs like Qwen3-8B. [1/n]

English

122

672

113.2K

Tong Chen retweetledi

Hongxun Wu@HongxunWu·6d

🧵(1/8) An @OpenAI internal reasoning LLM achieved an AI Math milestone: solving an open problem central to its mathematical subfield— in this case, the unit distance problem of discrete geometry. We came across it in a side quest to truly push our model on the hardest problems.

English

132

953

136.4K

Tong Chen@tomchen0·14 May

@ChengleiSi @tydsh @CaimingXiong congrats chenglei!!

Català

107

CLS@ChengleiSi·13 May

It’s been fun working closely with the great @tydsh and @CaimingXiong (and practicing table tennis with them 🏓)

Yuandong Tian@tydsh

Today we launch Recursive. We are building AI that discovers knowledge automatically and improves itself recursively, an open-ended process that will fundamentally change how science and technology advance. Our 25 top researchers and engineers in San Francisco and London bring diverse expertise spanning agentic AI scientists, architecture and algorithm design, world models, optimization, and interpretability, united by a shared conviction that this is the most important problem we could be working on today. If you are interested in joining, please send your resume to talent @recursive.com. Follow us at @Recursive_SI!

English

Tong Chen retweetledi

Stella Li@StellaLisy·7 May

LMs can learn from human labels, training data, and stronger teachers. But what happens when all of these run out🫪 when the model is already at the frontier and there is no stronger external source to learn from❓ In EvoLM, we extract the model's own evaluative knowledge into rubrics, and use them to improve its own generation🔁 This enables self-improvement with no external signals‼️

English

230

34K

Tong Chen retweetledi

Akari Asai@AkariAsai·30 Nis

2 papers accepted to ICML as Spotlights (top 2.2%)🥳 - DR Tulu: RL w/ evolving rubrics for SOTA long-form deep research arxiv.org/abs/2511.19399 - Binary RAR: RL w/ binary rewards for the hallucination–capability trade-off arxiv.org/abs/2510.17733 Congrats to all collaborators!

English

233

11.7K

Tong Chen retweetledi

Joongwon Kim@danieljwkim·22 Nis

New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0. 📄: arxiv.org/abs/2604.16529

English

358

278.6K

Tong Chen retweetledi

Teng Xiao@TengX6·16 Mar

🚀 New work: Meta-Reinforcement Learning with Self-Reflection LLM agents shouldn't just solve problems. They should learn from their own attempts. Most current RL methods optimize single independent trajectories. Each attempt starts from scratch, with no mechanism to improve across attempts. But intelligent systems should get better after trying once. This raises a fundamental question: How do we train models to learn from their own attempts? We believe Meta-Reinforcement Learning may be a key paradigm for training future LLM agents, enabling models to adapt and improve across attempts and environments. In this work we introduce MR-Search, a training paradigm built around: 🧠 In-Context Meta-Reinforcement Learning 🪞 Self-Reflection 🔁 Learning to learn at test time 📄 Paper: arxiv.org/abs/2603.11327 💻 Code: github.com/tengxiao1/MR-S…

English

297

50.9K

Tong Chen retweetledi

Yike Wang@yikewang_·18 Şub

Small language models are not very helpful as judges, how about 🔄 backward inference—inferring the instruction given only the response, and using the similarity between the inferred and the original instructions as the reward signal? Introducing ⚙️FLIP, a reference-free and rubric-free reward modeling approach that boosts the RewardBench2 performance of 13 small language models by an average of 79.6%, and substantially outperforms LLM-as-a-Judge under test-time scaling via parallel sampling and GRPO training. 📄paper: arxiv.org/abs/2602.13551 🔗code: github.com/yikee/FLIP

English

251

28.2K

Tong Chen retweetledi

Taiwei Shi@taiwei_shi·17 Şub

For decades, we’ve trained AI to chase rewards. But humans don’t just optimize outcomes. We experience, reflect, then learn. Can AI do the same? Introducing 𝐄𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐭𝐢𝐚𝐥 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠, a step toward AI that truly learn from experience.

English

219

1.3K

223.4K

Tong Chen retweetledi

Akari Asai@AkariAsai·4 Şub

Thrilled to share: OpenScholar - our work on scientific deep research agents for reliable literature synthesis -has been accepted to Nature! 🎉 Huge thanks to collaborators across institutions who made this possible!

English

226

1.3K

126.7K

Tong Chen retweetledi

Jiacheng Liu@liujc1998·26 Oca

Calling on behalf of infini-gram: does anyone know where I can get / apply for AWS credits? 💸💸 Keeping infini-gram alive costs quite some money, mostly SSD rental. If you're a fan of keeping open LLM training data readily inspectable, please reply / DM me some pointers! 🧵1/4

English

3.6K

Tong Chen retweetledi

CLS@ChengleiSi·23 Oca

Can LLMs automate frontier LLM research, like pre-training and post-training? In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes). 1/

English

141

588

109.6K

Tong Chen retweetledi

Augmented Mind Podcast@augmind_fm·21 Oca

AI used to be a distant promise; now it permeates our lives. AI is getting better, but is it making us better? We are promised that AI will augment our minds, but how? We--@EchoShao8899, @shannonzshen, and @michaelryan207--are excited to launch the Augmented Mind Podcast (The AM Podcast), a podcast about technical human-centered AI work. We'll share compelling research, infrastructure, and systems through monthly episodes, featuring interviews with the pioneering minds behind them. We release EP0 today to share who we are, why we started this podcast, and what we're looking forward to. 0:00 - Prelude: the problems we care about 1:48 - Host introduction 2:03 - Why we started the AM Podcast 2:31 - Hot takes on human-centered AI 10:45 - Format of our podcast 11:28 - Unique technical challenges in human-centered AI 16:45 - Let the journey begin!

English

66.3K

Tong Chen@tomchen0·17 Ara

@liujc1998 @HannaHajishirzi @YejinChoinka @uwnlp Congrats, Jiacheng!!!!

English

Jiacheng Liu@liujc1998·16 Ara

Belated update: I defended my PhD last month! I am tremendously grateful to my advisors, @HannaHajishirzi and @YejinChoinka. Without their incredible support, I wouldn’t have had so much fun exploring bold ideas, like taking a journey into the ocean of LLM pretraining data. 🥰🥰

English

308

20.8K

Tong Chen retweetledi

Liwei Jiang@liweijianglw·4 Ara

Super happy to receive the Best Paper Award at #NeurIPS2025 for our Artificial Hivemind paper!! (Really enjoyed giving oral talk at NeurIPS as well!)

Liwei Jiang@liweijianglw

⚠️Different models. Same thoughts.⚠️ Today’s AI models converge into an 𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐇𝐢𝐯𝐞𝐦𝐢𝐧𝐝 🐝, a striking case of mode collapse that persists even across heterogeneous ensembles. Our #neurips2025 𝐃&𝐁 𝐎𝐫𝐚𝐥 𝐩𝐚𝐩𝐞𝐫 (✨𝐭𝐨𝐩 𝟎.𝟑𝟓%✨) dives deep into this phenomenon, introducing 𝐈𝐧𝐟𝐢𝐧𝐢𝐭𝐲-𝐂𝐡𝐚𝐭, a real-world dataset of 26K real-world open-ended user queries spanning 17 open-ended categories + 31K dense human annotations (𝟐𝟓 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐚𝐧𝐧𝐨𝐭𝐚𝐭𝐨𝐫𝐬 𝐩𝐞𝐫 𝐞𝐱𝐚𝐦𝐩𝐥𝐞) to push AI’s creative and discovery potential forward. Now you can build your favorite models to be truly original, diverse, and impactful in the open-ended real world. 📍Paper: arxiv.org/abs/2510.22954 📍Data: huggingface.co/collections/li… We also systematically reveal Artificial Hivemind across: 💥 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: not only do individual LLMs repeat themselves, but different models produce strikingly similar content, even when asked fully open-ended questions. 💥 𝐃𝐢𝐬𝐜𝐫𝐢𝐦𝐢𝐧𝐚𝐭𝐢𝐯𝐞 𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: LLMs, LM judges, and reward models are systematically miscalibrated when rating alternative responses to open-ended queries. (1/N)

English

779

80.5K

Tong Chen retweetledi

Rui Xin@rui_xin31·3 Ara

I'll be at #NeurIPS2025 until 12/7! I work on post-training and reward signals (Spurious Rewards), currently curious about bridging the gap between how humans and LLMs learn. Looking forward to connecting with new and old friends—also exploring summer 2025 internships. DMs open!

English

15.7K

Tong Chen@tomchen0·3 Ara

I will be at #NeurIPS2025 12.3–12.7 Looking forward to meeting old and new friends ! ☕️🌮 Recently working on hallucination (Binary RAR) and verbatim memorization (ParaPO), issues that scaling up pretraining cannot simply fix. Also interested in making models learn more like humans: strong generalization, non-scalar rewards, episodic memory, and long-horizon abilities.

English

4.1K

Tong Chen retweetledi

Yiping Wang@ypwang61·1 Ara

8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀! ⭕Circle packing: AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276 Ours (DeepSeek-R1-0528-Qwen3-8B) : 2.63598308 🔗in🧵 [1/n]

English

201

45.3K

Tong Chen@tomchen0·28 Kas

PhD applicants — Join Akari’s first cohort of students! Akari's research ranges from careful benchmarking to solid methodology. She always gives sharp feedback while being thoughtful and supportive. She stayed driven throughout her PhD and now brings that same energy to her new lab. I am grateful to learn from her and to work with her — please apply!

Akari Asai@AkariAsai

1/ Hiring PhD students at CMU SCS (LTI/MLD) for Fall 2026 (Deadline 12/10) 🎓 I work on open, reliable LMs: augmented LMs & agents (RAG, tool use, deep research), safety (hallucinations, copyright), and AI for science, code & multilinguality & open to bold new ideas! FAQ in 🧵

English

17.5K

Tong Chen retweetledi

Akari Asai@AkariAsai·25 Kas

Exciting DR Tulu updates! 📈 DR Tulu-8B (new RL ckpt) sits on the performance–cost frontier, beating Tongyi DR-30B and matching OpenAI DR/Gemini 3 Pro+Search at a fraction of the cost. Now on arXiv. 🖥️ You can run an interactive CLI demo with open code, almost for free. 1/🧵

Ai2@allen_ai

Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚

English

153

50.5K

Tong Chen@tomchen0·20 Kas

Olmo3 is here! 🎉 Fully open data and fully open training recipes again. 🚀 Huge congrats to the whole team!

Ai2@allen_ai

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

English

1.8K

Keşfet

@OpenAI @ChengleiSi @tydsh @CaimingXiong @AIatMeta @EchoShao8899 @shannonzshen @michaelryan207