Yao Dou

75 posts

Yao Dou

@Yaooo01

PhD student @GeorgiaTech, previously @MSFTResearch, @uwnlp, @allen_ai.

Atlanta, GA Katılım Eylül 2017

298 Takip Edilen269 Takipçiler

Yao Dou@Yaooo01·10m

RT @alan_ritter: New paper by my Ph.D. student @hyungjoochae: How can we build a safe, scalable learning environment for web agents? https…

English

Yao Dou retweetledi

Yang Chen@ychenNLP·1d

We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that

Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

103

9.3K

Yao Dou@Yaooo01·13 Oca

@cwolferesearch arxiv.org/abs/2512.13607

QME

Cameron R. Wolfe, Ph.D.@cwolferesearch·12 Oca

Currently reading / writing about the intersection of RL and continual learning. Here are some great papers I’ve found on these topics so far: - arxiv.org/abs/2507.05386 - arxiv.org/abs/2510.18874 - arxiv.org/abs/2509.04259 - arxiv.org/abs/2308.08747 Please share any others you’re aware of! Would love to find more work in this space to include in my writeup.

English

561

28.3K

Yao Dou retweetledi

Yang Chen@ychenNLP·16 Ara

🥈 Silver Medal at IOI 2025 & Outperforms DeepSeek-R1-0528 on LiveCodeBench. Instead of mixing different tasks together, we scale *Cascade RL* to develop general LLMs in curriculum (RLFH -> Instruct -> Math -> Code -> SWE). So many learnings, check out our report!👇

English

226

23.9K

Yao Dou retweetledi

@·17 Eki

Introducing early experience: using future states resulting from agent’s own action as scalable supervision to train itself - without reward🧠! 1️⃣Reward-free: can train directly in real-world environments. 2️⃣Better RL warm-start: when continued with RL, leads to higher final

Jason Weston@jaseweston

🌀Agent Learning via Early Experience🌀 📝: arxiv.org/abs/2510.08558 - SFT for agents is sparse; RL on long-horizons is hard We provide new mid-training signals that work: 1) Implicit next state world modeling task 2) Self-reflection on alternate states - Strong improvements over 8 environments and multiple model families - Works well for subsequent RL! 🧵1/5

English

112

15.8K

Yao Dou@Yaooo01·16 Eki

@techietaro We compare assistant performance when interacting with simulators versus when with real users, which is mainly Spearman's correlation between simulator judgments with human judgments of the assistants.

English

Taro Bushidō@techietaro·15 Eki

@Yaooo01 26% boost is impressive! How do you quantify 'alignment'-accuracy, fluency, or user satisfaction?

English

115

Yao Dou@Yaooo01·15 Eki

Can LLM-simulated users replace expensive human evaluation for multi-turn conversations? Short answer: yes, if you model the user right. With our SimulatorArena, we find that detailed user profiles (knowledge + message style) improve alignment with real human evaluation by 26% at <3% the cost. #EMNLP2025 [1/6] 🧵

English

134

9.9K

Yao Dou@Yaooo01·15 Eki

Huge thanks to my amazing coauthors—Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, @alan_ritter, Chris Quirk, @cocoweixu, @JianfengGao0217—for an incredible collaboration at @MSFTResearch and @ICatGT.

English

329

Yao Dou@Yaooo01·15 Eki

[6/6] 🌐Website: simulatorarena.ai 📄Paper: arxiv.org/abs/2510.05444 💻Github: github.com/microsoft/Simu…

English

402

Yao Dou retweetledi

@·10 Eki

Simulating user–AI conversations helps us understand how LMs work in multi-turn settings. Prompting LMs like GPT-4o to simulate users is common, but their assistant nature makes it hard to replicate user behavior. We introduce User LMs - trained to be users, not assistants.

English

147

29.3K

Yao Dou retweetledi

Wei Xu@cocoweixu·8 Eki

🎉 Two papers at #COLM2025! (1) Evaluating LLMs on Idiom Translation — a predoctoral internship project in my lab, presented by @heinemandavidj (2) LLM Knowledge Cutoff in the Finance Domain — led by PhD student collaborator @shahagam4 Come say hi 👋 at the conference!

English

9.4K

Yao Dou retweetledi

Shirley Wu@ShirleyYXWu·25 Eyl

With the help from Bytedance @verl_project team, we have integrated collabllm as a recipe to veRL, a to-be-most-popular open RL library for LLMs. Now you are only one step away from making your LLM a great collaborator in multiturn conversations. verl.readthedocs.io/en/latest/algo…

English

127

17.7K

Yao Dou retweetledi

Wei Xu@cocoweixu·19 Eyl

Paper accepted to #NeurIPS2025! 🎉 "Probabilistic Reasoning with LLMs" by my PhD student @JonathanQZheng → arxiv.org/abs/2503.09674 Super exciting! It moves beyond rigid math/logic reasoning into probabilistic reasoning, reflecting how people tackle many real-world problems.

English

724

58.4K

Yao Dou retweetledi

Shirley Wu@ShirleyYXWu·10 Tem

Introducing 🔥Optimas🔥: The first unified framework to optimize compound AI systems composed of multiple components like trainable/API-based LLMs, tools, model routers, and traditional ML models! 🌐 👉🏻 optimas.stanford.edu 🌟 Why Optimas? AI systems today combine diverse elements—prompts, model parameters, hyperparameters, and model router. Optimizing the entire system effectively is tough! Optimas tackles this with an intuitive strategy: Globally Aligned Local Rewards (LRFs), ensuring each component's optimization directly boosts overall system performance! 📈 Impressive Results: Tested rigorously on 5 real-world compound AI tasks: Product Recommendation, Medical QA, Complex Retrieval, Multi-hop QA, and Code Generation. 🤩 Delivers an impressive average boost of 11.92% over top baselines (e.g., LLMSelector, TextGrad, DSPy). 🔧 Here's the magic behind Optimas: ① Assigns each component a Local Reward Function (LRF). ② Aligns these LRFs with global objectives, enabling independent yet coordinated optimizations. ③ Adaptively updates LRFs for efficient, coherent improvements across diverse configurations. 💡 Compatible with popular agentic frameworks Easily optimize your own systems! Integrates with popular agentic frameworks like @DSPyOSS, @crewAIInc, @pyautogen, TextGrad, and OpenAI Agent SDK @OpenAIDevs! Proudly developed by an outstanding collaboration between @StanfordAILab, @AmazonScience, and more! Grateful to work with team @parthsarthi03, Shiyu, Aaron, @krypticmouse, @Diyi_Yang, @james_y_zou, @jure etc.! Check out more! 📄 Paper: arxiv.org/abs/2507.03041 💻 Code: github.com/snap-stanford/… (to be open-sourced soon!) #CompoundAISystem #LLM #Optimization #MachineLearning

English

202

46.5K

Yao Dou retweetledi

Geyang Guo@CherylolGuo·3 Tem

❤️🌎 Introducing CARE: Multilingual Multicultural Human Preference Learning 3490 culturally relevant prompts + 31.7k Human/AI-written responses rated by multilingual speakers 💡 Key insights: - Even a small amount of cultural data improves popular LLMs consistently. - Deepseek-v3 outperforms GPT-4o on Chinese/Japanese/Arabic questions. - Surprisingly, commonsense questions can be answered better in English. 📄 Paper: arxiv.org/abs/2504.05154 📊 Data: huggingface.co/datasets/geyan… 🔗 Code: github.com/Guochry/CARE

English

8.3K

Yao Dou retweetledi

Feng Yao@fengyao1909·1 Tem

😵‍💫 Struggling with 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐌𝐨𝐄? Meet 𝐃𝐞𝐧𝐬𝐞𝐌𝐢𝐱𝐞𝐫 — an MoE post-training method that offers more 𝐩𝐫𝐞𝐜𝐢𝐬𝐞 𝐫𝐨𝐮𝐭𝐞𝐫 𝐠𝐫𝐚𝐝𝐢𝐞𝐧𝐭, making MoE 𝐞𝐚𝐬𝐢𝐞𝐫 𝐭𝐨 𝐭𝐫𝐚𝐢𝐧 and 𝐛𝐞𝐭𝐭𝐞𝐫 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐢𝐧𝐠! Blog: fengyao.notion.site/moe-posttraini… Code: github.com/yaof20/DenseMi…

English

271

58.8K

Keşfet

@alan_ritter @hyungjoochae @cwolferesearch @techietaro @cocoweixu @JianfengGao0217 @MSFTResearch @ICatGT