Weiwei Sun

132 posts

Weiwei Sun banner
Weiwei Sun

Weiwei Sun

@sunweiwei12

PhD student @LTIatCMU | Ex Seed @VectorInst @Baidu_Inc @ShandongU @ic4ai | Working on LLM Agents

Pittsburgh, PA Katılım Haziran 2021
217 Takip Edilen697 Takipçiler
Weiwei Sun
Weiwei Sun@sunweiwei12·
451 real users vs. LLM simulators. We find a clear sim2real gap across 21 behavioral dimensions. Models deviate from humans in many ways (eg too polite, too verbose) Stronger LLMs can even be worse at simulating humans. Check this out 👇 arxiv.org/pdf/2603.11245
Xuhui Zhou@nlpxuhui

Creating user simulators is a key to evaluating and training models for user-facing agentic applications. But are stronger LLMs better user simulators? TL;DR: not really. We ran the largest sim2real study for AI agents to date: 31 LLM simulators vs. 451 real humans across 165 tasks. Here's what we found (co-lead with @sunweiwei12).

English
0
1
6
373
Weiwei Sun retweetledi
Xuhui Zhou
Xuhui Zhou@nlpxuhui·
Creating user simulators is a key to evaluating and training models for user-facing agentic applications. But are stronger LLMs better user simulators? TL;DR: not really. We ran the largest sim2real study for AI agents to date: 31 LLM simulators vs. 451 real humans across 165 tasks. Here's what we found (co-lead with @sunweiwei12).
Xuhui Zhou tweet media
English
5
40
164
12K
Weiwei Sun retweetledi
Zhuokai Zhao
Zhuokai Zhao@zhuokaiz·
Been really enjoying this paper by @sunweiwei12 et al. lately: arxiv.org/pdf/2510.11967 I really like how it treats context management as something the agent actually learns, instead of an external system hack like summarization or fixed multi-agent setups. The test-time idea is also pretty clean, the agent just spins up sub-trajectories when needed, no pre-defined roles. Imo a really smart way to scale long-horizon agents beyond "just use a bigger context window."
Zhuokai Zhao tweet media
English
5
50
338
18.7K
Weiwei Sun
Weiwei Sun@sunweiwei12·
Check out our #NeurIPS2025 poster, “Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning”! 📅 Dec 3, 11:00 AM–2:00 PM PST 📍 Exhibit Hall C/D/E, #3403 📄 Paper: arxiv.org/pdf/2501.15228
English
0
2
8
626
Weiwei Sun retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
Can AI agents autonomously explore, synthesize, and discover knowledge like researchers? 🤖🔬 Introducing a comprehensive survey on Deep Research (DR) systems, where LLMs evolve from passive text generators into autonomous agents capable of long-horizon reasoning and verifiable knowledge creation. 🗺️ Three-phase roadmap: 1⃣ Agentic Search → Precise evidence acquisition 2⃣ Integrated Research → Multi-source synthesis & reporting 3⃣ Full-stack AI Scientist → Hypothesis generation & discovery 🔧 Four foundational components: 1⃣ Query Planning: Decompose complex questions (parallel, sequential, tree-based). 2⃣ Information Acquisition: Dynamically retrieve from web search, APIs, & multimodal sources. 3⃣ Memory Management: Store, update, and prune context over long horizons. 4⃣ Answer Generation: Synthesize verifiable, cited reports. 🚀 Three optimization paradigms: 1⃣ Workflow Prompting 2⃣ Supervised Fine-Tuning (SFT) 3⃣ End-to-End Agentic Reinforcement Learning (RL) 📊 Key Insight: DR is not just advanced RAG. Unlike standard RAG, DR enables: ✅ Flexible interaction & tool use beyond static retrieval ✅ Long-horizon planning with autonomous workflows ✅ Reliable, verifiable, and structured outputs 📈 As the field evolves, we are committed to continuously updating this survey to reflect the latest progress! 🧑‍💻 Project: github.com/mangopy/Deep-R… 📃 Paper: preprints.org/manuscript/202…
Zhaopeng Tu tweet mediaZhaopeng Tu tweet mediaZhaopeng Tu tweet media
English
12
60
230
17.9K
Weiwei Sun
Weiwei Sun@sunweiwei12·
AirRep does have an upfront cost: we generate supervision and train the encoder. But once trained, it amortizes beautifully. After a moderate crossover point, AirRep can attribute 100M of examples under the same GPU budget where gradient-based methods manage only a few million.
Weiwei Sun tweet media
English
1
0
1
161
Weiwei Sun
Weiwei Sun@sunweiwei12·
🚨 Modern LLMs are trained on trillions of tokens, but for any given output, only a tiny subset of examples really matter. Training Data Attribution (TDA) is about finding those examples and measuring their influence. Gradient-based approaches, while well-founded, are extremely costly for LLMs because they require computing and storing gradients. 💡 We introduce AirRep, a small representation model trained to predict how training data influences model behavior. The result: as accurate as gradient-based methods (and often more accurate), 80× faster, and with 50× storage reduction. On a single GPU, AirRep can process 2500 examples per second, while a well-optimized gradient-based model can only handle 30. #neurips2025
Weiwei Sun tweet media
English
1
1
3
1.3K
Weiwei Sun retweetledi
Xuhui Zhou
Xuhui Zhou@nlpxuhui·
New blog post out! 📜 We share our latest research efforts to build more effective, human-centered AI collaboration. Months ago, I was genuinely surprised by how quickly AI agents were improving, and with that came a deep fear of being replaced, of humans slowly losing agency as AI grows more capable. At the same time, I felt the intense frustration of working with coding agents who produce thousands of lines of seemingly working code that ultimately prove unusable. These days, I’ve been coming to a clearer conclusion: the future of AI has to be true human–AI collaboration. And making that collaboration actually smooth, not frustrating, not disempowering, has never been more important. xuhuiz.com/blog/on-the-qu… #AI #AIAgents #HumanAICollaboration
English
3
25
124
24.1K
Weiwei Sun retweetledi
Alex Prompter
Alex Prompter@alex_prompter·
🚨 Carnegie Mellon just dropped one of the most important AI agent papers of the year. It’s called “Training Proactive and Personalized LLM Agents.” Here’s the wild part... they didn’t train agents to just complete tasks. They trained them to talk better. Most AI agents are task junkies: they execute, they don’t interact. These new ones do three things simultaneously: → Productivity – actually finish the job → Proactivity – ask smart clarifying questions → Personalization – adapt tone, style, and behavior to you They built a full interactive world called UserVille, filled with simulated users each with unique personalities and quirks (like users who only reply in JSON, or only answer A/B/C questions 🤯). Then they trained agents using a new RL framework called PPP (Productive, Proactive, Personalized). Results? +21.6% higher performance than GPT-5 across complex engineering & research tasks. Agents started asking fewer, sharper questions and mirroring user preferences automatically. This is the future: Not just agents that do things but agents that understand who they’re doing them for. Paper: arxiv. org/abs/2511.02208v1
Alex Prompter tweet media
English
24
118
559
47.5K
Weiwei Sun
Weiwei Sun@sunweiwei12·
@kmingl20 @nlpxuhui @StigLidu @gneubig @MaartenSap Thanks! We design this mostly based on the time required for the user to respond. Refusing usually costs little user time, so it’s a medium effort (and it accumulates so more poor questions = more penalty) High effort means the user has to spend time doing actual work.
English
0
0
0
135
Weiwei Sun
Weiwei Sun@sunweiwei12·
AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply. We argue that real-world deployment requires more than productivity (e.g., task accuracy); agents must also be proactive in communication and personalized to individual user preferences. Our new work introduces PPP, a Productive, Proactive, and Personalized optimization framework that explicitly trains LLM agents for effective human interaction. 🚀PPP achieves significant gains in complex, real-world agent–user scenarios (software engineering and deep research), outperforming even GPT-5 on both tasks with initially vague user instructions.
Weiwei Sun tweet media
English
13
59
297
188.7K
Weiwei Sun retweetledi
Marktechpost AI Dev News ⚡
Marktechpost AI Dev News ⚡@Marktechpost·
CMU Researchers Introduce PPP and UserVille To Train Proactive And Personalized LLM Agents Most LLM agents are tuned to maximize task success. They resolve GitHub issues or answer deep research queries, but they do not reason carefully about when to ask the user questions or how to respect different interaction preferences. How can we design LLM agents that know when to ask better questions and adapt their behavior to each individual user? A team of researchers from Carnegie Mellon University CMU and OpenHands formalizes these missing behaviors as 3 joint objectives, Productivity, Proactivity, and Personalization, and optimizes them with a multi objective reinforcement learning framework called PPP inside a new environment named UserVille. Key Takeaways ➡️ PPP frames agent training as a multi objective RL problem that jointly optimizes Productivity, Proactivity, and Personalization, instead of focusing only on task success. ➡️ UserVille builds vague prompt versions of existing benchmarks and pairs them with preference aware user simulators, which enforce 20 distinct interaction preferences and label user effort levels. ➡️ The total reward combines task metric, user effort, and preference adherence, using bonuses for low effort questions and penalties for medium and high effort or preference violations, implemented with a GRPO based RL algorithm. ➡️ On SWE Bench Func Loc and BrowseComp Plus with vague prompts, PPP trained Seed OSS 36B significantly improves all 3 metrics over the base model and over GPT 5 baselines, with an average gain of about 16.72 points across dimensions and datasets. ➡️ PPP agents generalize to unseen preferences, alternate simulators, and harder tasks such as SWE Bench Full, and they learn to ask fewer but more targeted low effort questions, especially when prompts are vague. Full analysis: marktechpost.com/2025/11/06/cmu… Paper: arxiv.org/abs/2511.02208 Repo: github.com/sunnweiwei/PPP… @nlpxuhui @sunweiwei12 @nlpxuhui @StigLidu @xingyaow_ @wellecks @gneubig @MaartenSap
Marktechpost AI Dev News ⚡ tweet media
English
0
9
18
1.5K