Yao Dou

75 posts

Yao Dou

Yao Dou

@Yaooo01

PhD student @GeorgiaTech, previously @MSFTResearch, @uwnlp, @allen_ai.

Atlanta, GA Katılım Eylül 2017
298 Takip Edilen269 Takipçiler
Yao Dou
Yao Dou@Yaooo01·
RT @alan_ritter: New paper by my Ph.D. student @hyungjoochae: How can we build a safe, scalable learning environment for web agents? https…
English
0
2
0
1
Yao Dou retweetledi
Yang Chen
Yang Chen@ychenNLP·
We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that
Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English
4
18
103
9.3K
Yao Dou retweetledi
Yang Chen
Yang Chen@ychenNLP·
🥈 Silver Medal at IOI 2025 & Outperforms DeepSeek-R1-0528 on LiveCodeBench. Instead of mixing different tasks together, we scale *Cascade RL* to develop general LLMs in curriculum (RLFH -> Instruct -> Math -> Code -> SWE). So many learnings, check out our report!👇
Yang Chen tweet media
English
5
43
226
23.9K
Yao Dou retweetledi
@·
Introducing early experience: using future states resulting from agent’s own action as scalable supervision to train itself - without reward🧠! 1️⃣Reward-free: can train directly in real-world environments. 2️⃣Better RL warm-start: when continued with RL, leads to higher final
Jason Weston@jaseweston

🌀Agent Learning via Early Experience🌀 📝: arxiv.org/abs/2510.08558 - SFT for agents is sparse; RL on long-horizons is hard We provide new mid-training signals that work: 1) Implicit next state world modeling task 2) Self-reflection on alternate states - Strong improvements over 8 environments and multiple model families - Works well for subsequent RL! 🧵1/5

English
2
27
112
15.8K
Yao Dou
Yao Dou@Yaooo01·
@techietaro We compare assistant performance when interacting with simulators versus when with real users, which is mainly Spearman's correlation between simulator judgments with human judgments of the assistants.
English
1
0
0
66
Taro Bushidō
Taro Bushidō@techietaro·
@Yaooo01 26% boost is impressive! How do you quantify 'alignment'-accuracy, fluency, or user satisfaction?
English
1
0
0
115
Yao Dou
Yao Dou@Yaooo01·
Can LLM-simulated users replace expensive human evaluation for multi-turn conversations? Short answer: yes, if you model the user right. With our SimulatorArena, we find that detailed user profiles (knowledge + message style) improve alignment with real human evaluation by 26% at <3% the cost. #EMNLP2025 [1/6] 🧵
Yao Dou tweet media
English
4
25
134
9.9K
Yao Dou retweetledi
@·
Simulating user–AI conversations helps us understand how LMs work in multi-turn settings. Prompting LMs like GPT-4o to simulate users is common, but their assistant nature makes it hard to replicate user behavior. We introduce User LMs - trained to be users, not assistants.
 tweet media
English
2
26
147
29.3K
Yao Dou retweetledi
Wei Xu
Wei Xu@cocoweixu·
🎉 Two papers at #COLM2025! (1) Evaluating LLMs on Idiom Translation — a predoctoral internship project in my lab, presented by @heinemandavidj (2) LLM Knowledge Cutoff in the Finance Domain — led by PhD student collaborator @shahagam4 Come say hi 👋 at the conference!
Wei Xu tweet mediaWei Xu tweet media
English
1
8
73
9.4K
Yao Dou retweetledi
Shirley Wu
Shirley Wu@ShirleyYXWu·
With the help from Bytedance @verl_project team, we have integrated collabllm as a recipe to veRL, a to-be-most-popular open RL library for LLMs. Now you are only one step away from making your LLM a great collaborator in multiturn conversations. verl.readthedocs.io/en/latest/algo…
Shirley Wu tweet media
English
0
11
127
17.7K
Yao Dou retweetledi
Wei Xu
Wei Xu@cocoweixu·
Paper accepted to #NeurIPS2025! 🎉 "Probabilistic Reasoning with LLMs" by my PhD student @JonathanQZhengarxiv.org/abs/2503.09674 Super exciting! It moves beyond rigid math/logic reasoning into probabilistic reasoning, reflecting how people tackle many real-world problems.
Wei Xu tweet mediaWei Xu tweet media
English
11
94
724
58.4K
Yao Dou retweetledi
Shirley Wu
Shirley Wu@ShirleyYXWu·
Introducing 🔥Optimas🔥: The first unified framework to optimize compound AI systems composed of multiple components like trainable/API-based LLMs, tools, model routers, and traditional ML models! 🌐 👉🏻 optimas.stanford.edu 🌟 Why Optimas? AI systems today combine diverse elements—prompts, model parameters, hyperparameters, and model router. Optimizing the entire system effectively is tough! Optimas tackles this with an intuitive strategy: Globally Aligned Local Rewards (LRFs), ensuring each component's optimization directly boosts overall system performance! 📈 Impressive Results: Tested rigorously on 5 real-world compound AI tasks: Product Recommendation, Medical QA, Complex Retrieval, Multi-hop QA, and Code Generation. 🤩 Delivers an impressive average boost of 11.92% over top baselines (e.g., LLMSelector, TextGrad, DSPy). 🔧 Here's the magic behind Optimas: ① Assigns each component a Local Reward Function (LRF). ② Aligns these LRFs with global objectives, enabling independent yet coordinated optimizations. ③ Adaptively updates LRFs for efficient, coherent improvements across diverse configurations. 💡 Compatible with popular agentic frameworks Easily optimize your own systems! Integrates with popular agentic frameworks like @DSPyOSS, @crewAIInc, @pyautogen, TextGrad, and OpenAI Agent SDK @OpenAIDevs! Proudly developed by an outstanding collaboration between @StanfordAILab, @AmazonScience, and more! Grateful to work with team @parthsarthi03, Shiyu, Aaron, @krypticmouse, @Diyi_Yang, @james_y_zou, @jure etc.! Check out more! 📄 Paper: arxiv.org/abs/2507.03041 💻 Code: github.com/snap-stanford/… (to be open-sourced soon!) #CompoundAISystem #LLM #Optimization #MachineLearning
Shirley Wu tweet media
English
5
34
202
46.5K
Yao Dou retweetledi
Geyang Guo
Geyang Guo@CherylolGuo·
❤️🌎 Introducing CARE: Multilingual Multicultural Human Preference Learning 3490 culturally relevant prompts + 31.7k Human/AI-written responses rated by multilingual speakers 💡 Key insights: - Even a small amount of cultural data improves popular LLMs consistently. - Deepseek-v3 outperforms GPT-4o on Chinese/Japanese/Arabic questions. - Surprisingly, commonsense questions can be answered better in English. 📄 Paper: arxiv.org/abs/2504.05154 📊 Data: huggingface.co/datasets/geyan… 🔗 Code: github.com/Guochry/CARE
Geyang Guo tweet media
English
2
13
65
8.3K
Yao Dou retweetledi
Feng Yao
Feng Yao@fengyao1909·
😵‍💫 Struggling with 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐌𝐨𝐄? Meet 𝐃𝐞𝐧𝐬𝐞𝐌𝐢𝐱𝐞𝐫 — an MoE post-training method that offers more 𝐩𝐫𝐞𝐜𝐢𝐬𝐞 𝐫𝐨𝐮𝐭𝐞𝐫 𝐠𝐫𝐚𝐝𝐢𝐞𝐧𝐭, making MoE 𝐞𝐚𝐬𝐢𝐞𝐫 𝐭𝐨 𝐭𝐫𝐚𝐢𝐧 and 𝐛𝐞𝐭𝐭𝐞𝐫 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐢𝐧𝐠! Blog: fengyao.notion.site/moe-posttraini… Code: github.com/yaof20/DenseMi…
Feng Yao tweet media
English
5
56
271
58.8K