Tong Chen

174 posts

Tong Chen

Tong Chen

@tomchen0

PhD student @uwcse @uwnlp

Katılım Şubat 2023
568 Takip Edilen826 Takipçiler
Sabitlenmiş Tweet
Tong Chen
Tong Chen@tomchen0·
OpenAI's blog (openai.com/index/why-lang…) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with our Binary Retrieval-Augmented Reward (RAR) can improve factuality (40% reduction in hallucination) while preserving model utility (win rate and accuracy) of fully trained, capable LMs like Qwen3-8B. [1/n]
Tong Chen tweet media
English
27
123
669
111.1K
Tong Chen retweetledi
Teng Xiao
Teng Xiao@TengX6·
🚀 New work: Meta-Reinforcement Learning with Self-Reflection LLM agents shouldn't just solve problems. They should learn from their own attempts. Most current RL methods optimize single independent trajectories. Each attempt starts from scratch, with no mechanism to improve across attempts. But intelligent systems should get better after trying once. This raises a fundamental question: How do we train models to learn from their own attempts? We believe Meta-Reinforcement Learning may be a key paradigm for training future LLM agents, enabling models to adapt and improve across attempts and environments. In this work we introduce MR-Search, a training paradigm built around: 🧠 In-Context Meta-Reinforcement Learning 🪞 Self-Reflection 🔁 Learning to learn at test time 📄 Paper: arxiv.org/abs/2603.11327 💻 Code: github.com/tengxiao1/MR-S…
English
11
49
300
48.8K
Tong Chen retweetledi
Yike Wang
Yike Wang@yikewang_·
Small language models are not very helpful as judges, how about 🔄 backward inference—inferring the instruction given only the response, and using the similarity between the inferred and the original instructions as the reward signal? Introducing ⚙️FLIP, a reference-free and rubric-free reward modeling approach that boosts the RewardBench2 performance of 13 small language models by an average of 79.6%, and substantially outperforms LLM-as-a-Judge under test-time scaling via parallel sampling and GRPO training. 📄paper: arxiv.org/abs/2602.13551  🔗code: github.com/yikee/FLIP
Yike Wang tweet media
English
12
52
251
27.6K
Tong Chen retweetledi
Taiwei Shi
Taiwei Shi@taiwei_shi·
For decades, we’ve trained AI to chase rewards. But humans don’t just optimize outcomes. We experience, reflect, then learn. Can AI do the same? Introducing 𝐄𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐭𝐢𝐚𝐥 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠, a step toward AI that truly learn from experience.
Taiwei Shi tweet media
English
42
219
1.3K
218.5K
Tong Chen retweetledi
Akari Asai
Akari Asai@AkariAsai·
Thrilled to share: OpenScholar - our work on scientific deep research agents for reliable literature synthesis -has been accepted to Nature! 🎉 Huge thanks to collaborators across institutions who made this possible!
Akari Asai tweet media
English
35
228
1.3K
125.8K
Tong Chen retweetledi
Jiacheng Liu
Jiacheng Liu@liujc1998·
Calling on behalf of infini-gram: does anyone know where I can get / apply for AWS credits? 💸💸 Keeping infini-gram alive costs quite some money, mostly SSD rental. If you're a fan of keeping open LLM training data readily inspectable, please reply / DM me some pointers! 🧵1/4
Jiacheng Liu tweet media
English
3
15
24
3K
Tong Chen retweetledi
CLS
CLS@ChengleiSi·
Can LLMs automate frontier LLM research, like pre-training and post-training? In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes). 1/
CLS tweet media
English
11
142
578
106.2K
Tong Chen retweetledi
Augmented Mind Podcast
Augmented Mind Podcast@augmind_fm·
AI used to be a distant promise; now it permeates our lives. AI is getting better, but is it making us better? We are promised that AI will augment our minds, but how? We--@EchoShao8899, @shannonzshen, and @michaelryan207--are excited to launch the Augmented Mind Podcast (The AM Podcast), a podcast about technical human-centered AI work. We'll share compelling research, infrastructure, and systems through monthly episodes, featuring interviews with the pioneering minds behind them. We release EP0 today to share who we are, why we started this podcast, and what we're looking forward to. 0:00 - Prelude: the problems we care about 1:48 - Host introduction 2:03 - Why we started the AM Podcast 2:31 - Hot takes on human-centered AI 10:45 - Format of our podcast 11:28 - Unique technical challenges in human-centered AI 16:45 - Let the journey begin!
English
10
34
80
61.9K
Jiacheng Liu
Jiacheng Liu@liujc1998·
Belated update: I defended my PhD last month! I am tremendously grateful to my advisors, @HannaHajishirzi and @YejinChoinka. Without their incredible support, I wouldn’t have had so much fun exploring bold ideas, like taking a journey into the ocean of LLM pretraining data. 🥰🥰
Jiacheng Liu tweet mediaJiacheng Liu tweet media
English
39
10
307
20.7K
Tong Chen retweetledi
Liwei Jiang
Liwei Jiang@liweijianglw·
Super happy to receive the Best Paper Award at #NeurIPS2025 for our Artificial Hivemind paper!! (Really enjoyed giving oral talk at NeurIPS as well!)
Liwei Jiang tweet mediaLiwei Jiang tweet media
Liwei Jiang@liweijianglw

⚠️Different models. Same thoughts.⚠️ Today’s AI models converge into an 𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐇𝐢𝐯𝐞𝐦𝐢𝐧𝐝 🐝, a striking case of mode collapse that persists even across heterogeneous ensembles. Our #neurips2025 𝐃&𝐁 𝐎𝐫𝐚𝐥 𝐩𝐚𝐩𝐞𝐫 (✨𝐭𝐨𝐩 𝟎.𝟑𝟓%✨) dives deep into this phenomenon, introducing 𝐈𝐧𝐟𝐢𝐧𝐢𝐭𝐲-𝐂𝐡𝐚𝐭, a real-world dataset of 26K real-world open-ended user queries spanning 17 open-ended categories + 31K dense human annotations (𝟐𝟓 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐚𝐧𝐧𝐨𝐭𝐚𝐭𝐨𝐫𝐬 𝐩𝐞𝐫 𝐞𝐱𝐚𝐦𝐩𝐥𝐞) to push AI’s creative and discovery potential forward. Now you can build your favorite models to be truly original, diverse, and impactful in the open-ended real world. 📍Paper: arxiv.org/abs/2510.22954 📍Data: huggingface.co/collections/li… We also systematically reveal Artificial Hivemind across: 💥 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: not only do individual LLMs repeat themselves, but different models produce strikingly similar content, even when asked fully open-ended questions. 💥 𝐃𝐢𝐬𝐜𝐫𝐢𝐦𝐢𝐧𝐚𝐭𝐢𝐯𝐞 𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: LLMs, LM judges, and reward models are systematically miscalibrated when rating alternative responses to open-ended queries. (1/N)

English
37
68
779
80.2K
Tong Chen retweetledi
Rui Xin
Rui Xin@rui_xin31·
I'll be at #NeurIPS2025 until 12/7! I work on post-training and reward signals (Spurious Rewards), currently curious about bridging the gap between how humans and LLMs learn. Looking forward to connecting with new and old friends—also exploring summer 2025 internships. DMs open!
English
3
7
56
15.7K
Tong Chen
Tong Chen@tomchen0·
I will be at #NeurIPS2025 12.3–12.7 Looking forward to meeting old and new friends ! ☕️🌮 Recently working on hallucination (Binary RAR) and verbatim memorization (ParaPO), issues that scaling up pretraining cannot simply fix. Also interested in making models learn more like humans: strong generalization, non-scalar rewards, episodic memory, and long-horizon abilities.
English
1
5
37
4K
Tong Chen retweetledi
Yiping Wang
Yiping Wang@ypwang61·
8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀! ⭕Circle packing: AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276 Ours (DeepSeek-R1-0528-Qwen3-8B) : 2.63598308 🔗in🧵 [1/n]
Yiping Wang tweet media
English
7
51
196
44.3K
Tong Chen
Tong Chen@tomchen0·
PhD applicants — Join Akari’s first cohort of students! Akari's research ranges from careful benchmarking to solid methodology. She always gives sharp feedback while being thoughtful and supportive. She stayed driven throughout her PhD and now brings that same energy to her new lab. I am grateful to learn from her and to work with her — please apply!
Akari Asai@AkariAsai

1/ Hiring PhD students at CMU SCS (LTI/MLD) for Fall 2026 (Deadline 12/10) 🎓 I work on open, reliable LMs: augmented LMs & agents (RAG, tool use, deep research), safety (hallucinations, copyright), and AI for science, code & multilinguality & open to bold new ideas! FAQ in 🧵

English
2
3
84
17.5K
Tong Chen retweetledi
Akari Asai
Akari Asai@AkariAsai·
Exciting DR Tulu updates! 📈 DR Tulu-8B (new RL ckpt) sits on the performance–cost frontier, beating Tongyi DR-30B and matching OpenAI DR/Gemini 3 Pro+Search at a fraction of the cost. Now on arXiv. 🖥️ You can run an interactive CLI demo with open code, almost for free. 1/🧵
Akari Asai tweet media
Ai2@allen_ai

Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚

English
4
29
153
50.2K
Tong Chen retweetledi
Rulin Shao
Rulin Shao@RulinShao·
🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model - are grounded on search knowledge 🧵
Rulin Shao tweet media
English
8
116
547
129.6K
Tong Chen retweetledi
Xuhui Zhou
Xuhui Zhou@nlpxuhui·
New blog post out! 📜 We share our latest research efforts to build more effective, human-centered AI collaboration. Months ago, I was genuinely surprised by how quickly AI agents were improving, and with that came a deep fear of being replaced, of humans slowly losing agency as AI grows more capable. At the same time, I felt the intense frustration of working with coding agents who produce thousands of lines of seemingly working code that ultimately prove unusable. These days, I’ve been coming to a clearer conclusion: the future of AI has to be true human–AI collaboration. And making that collaboration actually smooth, not frustrating, not disempowering, has never been more important. xuhuiz.com/blog/on-the-qu… #AI #AIAgents #HumanAICollaboration
English
3
25
124
24.2K
Tong Chen
Tong Chen@tomchen0·
@fangweei We retrieve and save (up to ten) related documents for each training prompt before training. They give evidence for detecting factual errors. Our method starts from a fully trained model and does not require ground-truth labels during training.
English
0
0
1
273
e fang
e fang@fangweei·
@tomchen0 how do you predict or provide the ground truth during RAR?
English
1
0
0
337
Tong Chen
Tong Chen@tomchen0·
OpenAI's blog (openai.com/index/why-lang…) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with our Binary Retrieval-Augmented Reward (RAR) can improve factuality (40% reduction in hallucination) while preserving model utility (win rate and accuracy) of fully trained, capable LMs like Qwen3-8B. [1/n]
Tong Chen tweet media
English
27
123
669
111.1K
Tong Chen
Tong Chen@tomchen0·
Thanks! This is a good point for future work. Our friend @RulinShao raised a similar case: if we ask “who is the current CEO of Apple Inc.” and the retriever brings in an old news article without a timestamp, it can introduce outdated facts and trigger a contradiction. This may push the model to over-abstain. Luckily, such time-sensitive questions are rare in our training data, but it is an interesting direction to study.
English
0
0
3
66
Peyman
Peyman@peyman_razaghi·
@tomchen0 Agree with that scenario but another case is when retrieval brings false evidence that contradicts the correct response — wouldn’t it at best cause abstinence or worst match with wrong outputs? using retrieval as judge at the end would be like using regex? vs another trained llm
English
1
0
1
49
Tong Chen
Tong Chen@tomchen0·
We agree that better retrieval can further reduce hallucination. In practice, our method is robust to retrieval noise. If the retriever fails to get the correct documents for a prompt, then all candidate responses receive a reward of one (no contradiction detected), which means the model receives no learning signal for that instance. This prevents the model from being pushed in the wrong direction when retrieval is inaccurate.
English
2
0
2
281
Peyman
Peyman@peyman_razaghi·
@tomchen0 Nice — but how does it control retrieval accuracy? Semantic retrieval could be just wrong
English
1
0
0
350