Qianhui Wu

33 posts

Qianhui Wu banner
Qianhui Wu

Qianhui Wu

@5000hui

Senior Researcher @MSFTResearch. Previously @Tsinghua_Uni.

Redmond, WA, USA Katılım Aralık 2021
134 Takip Edilen737 Takipçiler
Qianhui Wu
Qianhui Wu@5000hui·
We've released the full package for GUI-Libra! 🌟 📂 Data/Model: huggingface.co/GUI-Libra 📄 Paper: arxiv.org/abs/2602.22190 🌐 Project: gui-libra.github.io Happy to hear feedback from the community!
Rui Yang@RuiYang70669025

Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM

English
0
7
21
3.5K
Qianhui Wu
Qianhui Wu@5000hui·
Congrats to the LightMem team! 👏Great to see the continued exploration of topic-based segmentation and lightweight compression for building efficient memory systems for LLMs. Glad that our findings in SeCom and LLMLingua-2 have been useful building blocks for the community. 😀
Ningyu Zhang@ZJU@zxlzr

We’re thrilled to share that our team’s work LightMem has been accepted to ICLR 2026 🎉 Paper: arxiv.org/abs/2510.18866 Code: github.com/zjunlp/LightMem LightMem is a lightweight, modular memory system for LLM agents that enables scalable long-context reasoning and structured memory management across tasks and environments. Recent updates: 1️⃣ Introduced a comprehensive baseline evaluation framework for benchmarking memory layers (Mem0, A-MEM, LangMem) across datasets like LoCoMo and LongMemEval 2️⃣ Released a demo video showcasing long-context handling, along with tutorial notebooks covering multiple usage scenarios 3️⃣ Enabled multi-tool invocation via MCP Server integration 4️⃣ Added full LoCoMo dataset support and integrated GLM-4.6, achieving strong performance and efficiency with reproducible scripts 5️⃣ Supported local deployment through Ollama, vLLM, and Transformers with automatic model loading #ICLR2026 #LLM #Agents #MemorySystems #LightMem

English
0
2
8
1.1K
Qianhui Wu
Qianhui Wu@5000hui·
🔊2026 Summer Internship @MSFTResearch Deep Learning Group🔊 We’re looking for a self-motivated intern with strong background on ⛑️building GUI agent environments and/or 🏗️reinforcement learning. 📩Interested? Send your CV + a short intro to qianhuiwu@microsoft.com!
English
6
20
344
23.7K
Qianhui Wu
Qianhui Wu@5000hui·
🧠 Key ideas: @zhaoywang_CS • categorized task synthesis from real websites📷 • online task refinement when the task is against observation 📷 • offline trajectory refinement to remove noisy steps 🪄
Zhaoyang Wang@zhaoywang_CS

🚀 New work: SynthAgent – a fully synthetic supervision pipeline for web agents 🤖 We generate high quality and environment-specific tasks + trajectories to adapt agents to new websites without human efforts 🧠🧼 arxiv:arxiv.org/pdf/2511.06101 code:github.com/aiming-lab/Syn…

English
0
0
3
1.3K
Qianhui Wu retweetledi
Xiao Yu
Xiao Yu@xy2437·
Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]
Xiao Yu tweet media
English
1
4
10
2.6K
Qianhui Wu retweetledi
Da Yu
Da Yu@DaYu85201802·
✨ Internship Opportunity @ Google Research ✨ We are seeking a self-motivated student researcher to join our team at Google Research starting around January 2026. 🚀 In this role, you will contribute to research projects advancing agentic LLMs through tool use and RL, with the goal of enabling breakthrough applications. We are particularly interested in PhD students with a strong background in these areas. If interested, please send a brief self-introduction and your CV to yuda3.edu@gmail.com. Looking forward to connecting with talented researchers in this exciting space!
English
15
93
842
76K
Qianhui Wu
Qianhui Wu@5000hui·
@LiJunnan0409 Awesome work! 🥂 I feel like the design of our GUI-Actor — which can propose multiple candidate regions in one forward pass— combined with a Grounding Verifier could work really well within the 'test-time scaling' framework of GTA1! 😀
English
0
1
5
220
Li Junnan
Li Junnan@LiJunnan0409·
🚀Introducing GTA1 – our new GUI Agent that leads the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI's CUA! GTA1 improves two core components of GUI agents: Planning and Grounding. 🧠 Planning: A generic test-time scaling strategy that concurrently samples multiple action proposals each step and uses a judge to select the best—yielding more optimal execution plans. 🎯Grounding: A reinforcement learning recipe that trains a SoTA grounding model using click-based rewards, significantly improving success rate on coordinate-based actions (e.g. click). Paper: huggingface.co/papers/2507.05…
Li Junnan tweet media
English
2
15
68
5.9K
Qianhui Wu
Qianhui Wu@5000hui·
Huge thanks to the @SimularAI team for hosting, and to my amazing collaborators for making this project possible! 🙏 Excited to see where this direction takes us next! 🔗 aka.ms/GUI-Actor
Simular@SimularAI

Big thanks to Qianhui Wu @5000hui and the team behind “Act Where You See” for sharing their amazing work this week at @SimularAI Seminar! 🧠⚡️ Coordinate-free visual grounding for GUI agents is a huge leap toward human-like interaction. 📎 aka.ms/GUI-Actor #AI #SimularSeminar #GUIAgents #SimularToHuman

English
0
1
21
2.1K
Qianhui Wu
Qianhui Wu@5000hui·
@touken_titan The key idea of GUI-Actor should also apply in embodied scenarios. We are also thinking about how to adapt it such scenarios.
English
1
0
0
44
Titus Rowe
Titus Rowe@Titus_R0·
@5000hui @_akhaliq GUI grounding stays nascent until evaluated on embodied, persistent agentsedge-native, not just server.
English
1
0
0
48
Qianhui Wu
Qianhui Wu@5000hui·
🚀 Excited to share GUI-Actor—a new approach for GUI grounding! Big thanks to @_akhaliq for featuring our work! 🌐 Project page: microsoft.github.io/GUI-Actor/ 📜 Paper: arxiv.org/pdf/2506.03143 🤔 What's limiting coordinate generation-based GUI grounding? 1️⃣ Weak spatial-semantic alignment 2️⃣ Ambiguous supervision signals 3️⃣ Vision–action granularity mismatch 👀 But think about it: humans don’t calculate precise screen coordinates—we perceive elements and then act directly. 💡 Meet GUI-Actor: a VLM with an attention-based action head that: ✅ Addresses above limitations ✅ Proposes multiple candidate regions in one pass, enabling flexible downstream strategies. ✅ Performs coordinate-free grounding that better mirrors human behavior ➕ We also introduce a grounding verifier to select the most plausible action region — and it can boost other grounding methods too. 🎯 Results? GUI-Actor achieves SOTA on several benchmarks, even GUI-Actor-7B outperforms UI-TARS-72B on ScreenSpot-Pro, all using the same Qwen2-VL backbone.
AK@_akhaliq

Microsoft just dropped GUI-Actor on Hugging Face Coordinate-Free Visual Grounding for GUI Agents

English
4
26
103
28.5K
Qianhui Wu retweetledi
Jianwei Yang
Jianwei Yang@jw2yang4ai·
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 computer-vision-in-the-wild.github.io/cvpr-2025/ ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof. @YunzhuLiYZ, Prof. @furongh to talk about the exciting researches to bring vision to the wild! 🌎Join top researchers tackling real-world vision challenges — from dynamic environments to embodied agents! See you all at #CVPR2025! #CVPR2025 #ComputerVision #AI
Jianwei Yang tweet media
English
1
22
102
27.8K
Qianhui Wu
Qianhui Wu@5000hui·
🚀 Excited to introduce our latest work: Magma - A Foundation Model for Multimodal AI Agents! 🔥 🌐 Project: microsoft.github.io/Magma 📄 Paper: arxiv.org/pdf/2502.13130 Check it out and let us know what you think! #AIAgents #Multimodal
Jianwei Yang@jw2yang4ai

Thanks for featuring our work! @arankomatsuzaki. 🔥Today we are thrilled to announce our MSR flagship project Magma! This is a fully open-sourced project. We will roll out all the stuff: code, model and training data through the following days. Check out our full work here: microsoft.github.io/Magma ! To the best of our knowledge, Magma is the first-ever foundation model for multimodal AI agents designed to handle complex interactions for agentic tasks. With a single suite of parameters, Magma achieves state-of-the-art UI navigation and robotics manipulation across both digital and physical environments, as well as excelling on generic image and video understandings!

English
1
1
14
1.3K
Qianhui Wu retweetledi
Qian Liu
Qian Liu@sivil_taram·
Thrilled to share that RegMix has been accepted by #ICLR2025! 🎉 Massive shoutout to the incredible co-authors @xszheng2020 @Muennighoff @GuangtaoZ @LongxuDou @TianyuPang1 Jing Jiang @mavenlin! 🙏 Huge thanks to ICLR reviewers for helping us improve RegMix! 🌟 some key Improvements from v1 : 1️⃣ Expanded experiments to 100 domains, and Regression still works extremely well 📈 2️⃣ Conducted 7B model performance over 100B tokens and RegMix beats Human consistently🚀 3️⃣ More results on the confirmation of the rank invariance hypothesis across model scales 📉 4️⃣ New insights at using 1B proxy model level, and it does not show significant advantage than 1M proxy models actually🧠 Code: github.com/sail-sg/regmix Paper: arxiv.org/abs/2407.01492
Qian Liu tweet media
Qian Liu@sivil_taram

Still following your human intuition to mix corpora from different sources for language model pre-training 🧠? Everyone says that data mixture has a big impact on model performance, but how - and why🕵️? Did you know that web corpora are actually highly impactful for downstream tasks 🏆? Let's check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" 📄 🔬In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! 📈 Details in the thread 🧵

English
6
13
85
12.5K
Qianhui Wu retweetledi
Qianhui Wu
Qianhui Wu@5000hui·
🚀 Introducing SECOM🚀 How can conversational agents **better retain and retrieve past interactions** for more **coherent and personalized** experiences? Our latest work on **Memory Construction & Retrieval** tackles this challenge head-on! 🔍 Key Takeaways: ✅ **Granularity matters** – Turn-level, session-level & summarization-based memory struggle with retrieval accuracy and the semantic integrity/relevance of the context. ✅ **Prompt compression (e.g., LLMLingua-2) can denoise memory retrieval**, boosting both retrieval accuracy and response quality. 💡 Meet **SeCom** – an approach that **segments conversations topically** for memory construction and performs memory retrieval based on compressed memory units. 📊 Result? Superior performance on long-term conversation benchmarks such as LOCOMO and Long-MT-Bench+! 📖 Dive into the details: arxiv.org/abs/2502.05589
English
1
3
17
1.4K
Qianhui Wu retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Google presents: Scaling Embedding Layers in Language Models Outperforms a 1.9B parameter baseline across diverse corpora, while using only half the inferencetime FLOPS
Aran Komatsuzaki tweet media
English
3
36
253
29.3K
Huiqiang Jiang
Huiqiang Jiang@iofu728·
🏆Appreciate committee’s recognition of RetrievalAttention. We hope this study inspires more people to join MLSys and contribute to making vector databases great again. 🎊Congratulations @BaotongL22 @fanyang etc. #ENSLP @NeurIPS’24
Huiqiang Jiang tweet mediaHuiqiang Jiang tweet media
English
3
3
29
2.6K
Varun Babbar
Varun Babbar@VarunBa91495185·
@5000hui @MSFTResearch Are you specifically looking for PhD students who are close to graduating, or is this opportunity open to all PhD students, regardless of year?
English
2
0
0
222
Qianhui Wu
Qianhui Wu@5000hui·
🚀 2025 Summer Internship @MSFTResearch Deep Learning Group 🚀 We’re looking for a self-motivated intern to work with us on building intelligent agents that reason, plan, and interact with complex UIs! 📩 Interested? Send your CV and a brief self-intro to qianhuiwu@microsoft.com! #Internship #AIAgents #UINavigation
English
12
26
336
47.8K