Qianhui Wu

33 posts

Qianhui Wu

@5000hui

Senior Researcher @MSFTResearch. Previously @Tsinghua_Uni.

Redmond, WA, USA Katılım Aralık 2021

134 Takip Edilen737 Takipçiler

Qianhui Wu@5000hui·26 Şub

We've released the full package for GUI-Libra! 🌟 📂 Data/Model: huggingface.co/GUI-Libra 📄 Paper: arxiv.org/abs/2602.22190 🌐 Project: gui-libra.github.io Happy to hear feedback from the community!

Rui Yang@RuiYang70669025

Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM

English

3.5K

Qianhui Wu@5000hui·26 Şub

Congrats to the LightMem team! 👏Great to see the continued exploration of topic-based segmentation and lightweight compression for building efficient memory systems for LLMs. Glad that our findings in SeCom and LLMLingua-2 have been useful building blocks for the community. 😀

Ningyu Zhang@ZJU@zxlzr

We’re thrilled to share that our team’s work LightMem has been accepted to ICLR 2026 🎉 Paper: arxiv.org/abs/2510.18866 Code: github.com/zjunlp/LightMem LightMem is a lightweight, modular memory system for LLM agents that enables scalable long-context reasoning and structured memory management across tasks and environments. Recent updates: 1️⃣ Introduced a comprehensive baseline evaluation framework for benchmarking memory layers (Mem0, A-MEM, LangMem) across datasets like LoCoMo and LongMemEval 2️⃣ Released a demo video showcasing long-context handling, along with tutorial notebooks covering multiple usage scenarios 3️⃣ Enabled multi-tool invocation via MCP Server integration 4️⃣ Added full LoCoMo dataset support and integrated GLM-4.6, achieving strong performance and efficiency with reproducible scripts 5️⃣ Supported local deployment through Ollama, vLLM, and Transformers with automatic model loading #ICLR2026 #LLM #Agents #MemorySystems #LightMem

English

1.1K

Qianhui Wu@5000hui·12 Oca

🔊2026 Summer Internship @MSFTResearch Deep Learning Group🔊 We’re looking for a self-motivated intern with strong background on ⛑️building GUI agent environments and/or 🏗️reinforcement learning. 📩Interested? Send your CV + a short intro to qianhuiwu@microsoft.com!

English

344

23.7K

Qianhui Wu@5000hui·13 Kas

🧠 Key ideas: @zhaoywang_CS • categorized task synthesis from real websites📷 • online task refinement when the task is against observation 📷 • offline trajectory refinement to remove noisy steps 🪄

Zhaoyang Wang@zhaoywang_CS

🚀 New work: SynthAgent – a fully synthetic supervision pipeline for web agents 🤖 We generate high quality and environment-specific tasks + trajectories to adapt agents to new websites without human efforts 🧠🧼 arxiv：arxiv.org/pdf/2511.06101 code：github.com/aiming-lab/Syn…

English

1.3K

Qianhui Wu retweetledi

Xiao Yu@xy2437·15 Eki

Why can (V)LMs agents ace coding and math, yet struggle so badly in more complex environments like computer or phone use? 🤔 We find that one key factor lies in models' ability to understand and *simulate* the environment’s dynamics — and propose **Dyna-Mind** to address this! 🧵[1/n]

English

2.6K

Qianhui Wu retweetledi

Da Yu@DaYu85201802·23 Eyl

✨ Internship Opportunity @ Google Research ✨ We are seeking a self-motivated student researcher to join our team at Google Research starting around January 2026. 🚀 In this role, you will contribute to research projects advancing agentic LLMs through tool use and RL, with the goal of enabling breakthrough applications. We are particularly interested in PhD students with a strong background in these areas. If interested, please send a brief self-introduction and your CV to yuda3.edu@gmail.com. Looking forward to connecting with talented researchers in this exciting space!

English

842

76K

Qianhui Wu@5000hui·11 Tem

@LiJunnan0409 Awesome work! 🥂 I feel like the design of our GUI-Actor — which can propose multiple candidate regions in one forward pass— combined with a Grounding Verifier could work really well within the 'test-time scaling' framework of GTA1! 😀

English

220

Li Junnan@LiJunnan0409·9 Tem

🚀Introducing GTA1 – our new GUI Agent that leads the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI's CUA! GTA1 improves two core components of GUI agents: Planning and Grounding. 🧠 Planning: A generic test-time scaling strategy that concurrently samples multiple action proposals each step and uses a judge to select the best—yielding more optimal execution plans. 🎯Grounding: A reinforcement learning recipe that trains a SoTA grounding model using click-based rewards, significantly improving success rate on coordinate-based actions (e.g. click). Paper: huggingface.co/papers/2507.05…

English

5.9K

Qianhui Wu@5000hui·8 Tem

@i_Am_Snow_Flake Thanks for sharing! Please check out the demo here: github.com/microsoft/GUI-…

English

Sahil Sharma @i_Am_Snow_Flake·15 Haz

🚀 New from Microsoft: GUI-Actor – a coordinate-free visual grounding method for GUI agents! Paper: arxiv.org/abs/2506.03143 Demo + Models: microsoft.github.io/GUI-Actor/ HF: huggingface.co/microsoft/GUI-… #AI #VisionLanguage #VLM #GUIAgent #HuggingFace

English

108

Qianhui Wu@5000hui·29 Haz

Huge thanks to the @SimularAI team for hosting, and to my amazing collaborators for making this project possible! 🙏 Excited to see where this direction takes us next! 🔗 aka.ms/GUI-Actor

Simular@SimularAI

Big thanks to Qianhui Wu @5000hui and the team behind “Act Where You See” for sharing their amazing work this week at @SimularAI Seminar! 🧠⚡️ Coordinate-free visual grounding for GUI agents is a huge leap toward human-like interaction. 📎 aka.ms/GUI-Actor #AI #SimularSeminar #GUIAgents #SimularToHuman

English

2.1K

Qianhui Wu@5000hui·11 Haz

@touken_titan The key idea of GUI-Actor should also apply in embodied scenarios. We are also thinking about how to adapt it such scenarios.

English

Titus Rowe@Titus_R0·8 Haz

@5000hui @_akhaliq GUI grounding stays nascent until evaluated on embodied, persistent agentsedge-native, not just server.

English

Qianhui Wu@5000hui·4 Haz

🚀 Excited to share GUI-Actor—a new approach for GUI grounding! Big thanks to @_akhaliq for featuring our work! 🌐 Project page: microsoft.github.io/GUI-Actor/ 📜 Paper: arxiv.org/pdf/2506.03143 🤔 What's limiting coordinate generation-based GUI grounding? 1️⃣ Weak spatial-semantic alignment 2️⃣ Ambiguous supervision signals 3️⃣ Vision–action granularity mismatch 👀 But think about it: humans don’t calculate precise screen coordinates—we perceive elements and then act directly. 💡 Meet GUI-Actor: a VLM with an attention-based action head that: ✅ Addresses above limitations ✅ Proposes multiple candidate regions in one pass, enabling flexible downstream strategies. ✅ Performs coordinate-free grounding that better mirrors human behavior ➕ We also introduce a grounding verifier to select the most plausible action region — and it can boost other grounding methods too. 🎯 Results? GUI-Actor achieves SOTA on several benchmarks, even GUI-Actor-7B outperforms UI-TARS-72B on ScreenSpot-Pro, all using the same Qwen2-VL backbone.

AK@_akhaliq

Microsoft just dropped GUI-Actor on Hugging Face Coordinate-Free Visual Grounding for GUI Agents

English

103

28.5K

Qianhui Wu retweetledi

Jianwei Yang@jw2yang4ai·28 Nis

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 computer-vision-in-the-wild.github.io/cvpr-2025/ ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof. @YunzhuLiYZ, Prof. @furongh to talk about the exciting researches to bring vision to the wild! 🌎Join top researchers tackling real-world vision challenges — from dynamic environments to embodied agents! See you all at #CVPR2025! #CVPR2025 #ComputerVision #AI

English

102

27.8K

Qianhui Wu@5000hui·27 Mar

Check out our SeCom and other amazing works!

Microsoft Research@MSFTResearch

In this issue of Research Focus, we examine a new conversation segmentation method that delivers more coherent and personalized agent conversation, and we review efforts to improve MLLMs’ understanding of geologic maps. Check out the latest research: msft.it/6019q9k33

English

Qianhui Wu@5000hui·20 Şub

🚀 Excited to introduce our latest work: Magma - A Foundation Model for Multimodal AI Agents! 🔥 🌐 Project: microsoft.github.io/Magma 📄 Paper: arxiv.org/pdf/2502.13130 Check it out and let us know what you think! #AIAgents #Multimodal

Jianwei Yang@jw2yang4ai

Thanks for featuring our work! @arankomatsuzaki. 🔥Today we are thrilled to announce our MSR flagship project Magma! This is a fully open-sourced project. We will roll out all the stuff: code, model and training data through the following days. Check out our full work here: microsoft.github.io/Magma ! To the best of our knowledge, Magma is the first-ever foundation model for multimodal AI agents designed to handle complex interactions for agentic tasks. With a single suite of parameters, Magma achieves state-of-the-art UI navigation and robotics manipulation across both digital and physical environments, as well as excelling on generic image and video understandings!

English

1.3K

Qianhui Wu retweetledi

Qian Liu@sivil_taram·24 Oca

Thrilled to share that RegMix has been accepted by #ICLR2025! 🎉 Massive shoutout to the incredible co-authors @xszheng2020 @Muennighoff @GuangtaoZ @LongxuDou @TianyuPang1 Jing Jiang @mavenlin! 🙏 Huge thanks to ICLR reviewers for helping us improve RegMix! 🌟 some key Improvements from v1 : 1️⃣ Expanded experiments to 100 domains, and Regression still works extremely well 📈 2️⃣ Conducted 7B model performance over 100B tokens and RegMix beats Human consistently🚀 3️⃣ More results on the confirmation of the rank invariance hypothesis across model scales 📉 4️⃣ New insights at using 1B proxy model level, and it does not show significant advantage than 1M proxy models actually🧠 Code: github.com/sail-sg/regmix Paper: arxiv.org/abs/2407.01492

Qian Liu@sivil_taram

Still following your human intuition to mix corpora from different sources for language model pre-training 🧠? Everyone says that data mixture has a big impact on model performance, but how - and why🕵️? Did you know that web corpora are actually highly impactful for downstream tasks 🏆? Let's check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" 📄 🔬In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! 📈 Details in the thread 🧵

English

12.5K

Qianhui Wu retweetledi

Huiqiang Jiang@iofu728·23 Oca

SCBench has been accepted by #ICLR2025! Now you can evaluate your long-context methods across the full KV cache lifecycle. Congratulations to @liyucheng_2 and all my co-authors! Find more details at aka.ms/SCBench

Huiqiang Jiang@iofu728

🕸️ KV cache has its own lifecycle, but previous benchs focus solely on single-req, ignore full lifecycle. 🪻SCBench fills the gap with a KV-cache-centric angle, covering 4 kv cache stage across 12 tasks, 4 capability, and 2 shared-context mode aka.ms/SCBench

English

821

Qianhui Wu@5000hui·11 Şub

🚀 Introducing SECOM🚀 How can conversational agents **better retain and retrieve past interactions** for more **coherent and personalized** experiences? Our latest work on **Memory Construction & Retrieval** tackles this challenge head-on! 🔍 Key Takeaways: ✅ **Granularity matters** – Turn-level, session-level & summarization-based memory struggle with retrieval accuracy and the semantic integrity/relevance of the context. ✅ **Prompt compression (e.g., LLMLingua-2) can denoise memory retrieval**, boosting both retrieval accuracy and response quality. 💡 Meet **SeCom** – an approach that **segments conversations topically** for memory construction and performs memory retrieval based on compressed memory units. 📊 Result? Superior performance on long-term conversation benchmarks such as LOCOMO and Long-MT-Bench+! 📖 Dive into the details: arxiv.org/abs/2502.05589

English

1.4K

Qianhui Wu retweetledi

Aran Komatsuzaki@arankomatsuzaki·4 Şub

Google presents: Scaling Embedding Layers in Language Models Outperforms a 1.9B parameter baseline across diverse corpora, while using only half the inferencetime FLOPS

English

253

29.3K

Qianhui Wu@5000hui·16 Ara

@iofu728 @BaotongL22 @fanyang Congratulations! 🎉

English

157

Huiqiang Jiang@iofu728·16 Ara

🏆Appreciate committee’s recognition of RetrievalAttention. We hope this study inspires more people to join MLSys and contribute to making vector databases great again. 🎊Congratulations @BaotongL22 @fanyang etc. #ENSLP @NeurIPS’24

English

2.6K

Qianhui Wu@5000hui·13 Ara

@VarunBa91495185 @MSFTResearch PhD students at any stage of their program!

English

200

Varun Babbar@VarunBa91495185·13 Ara

@5000hui @MSFTResearch Are you specifically looking for PhD students who are close to graduating, or is this opportunity open to all PhD students, regardless of year?

English

222

Qianhui Wu@5000hui·12 Ara

🚀 2025 Summer Internship @MSFTResearch Deep Learning Group 🚀 We’re looking for a self-motivated intern to work with us on building intelligent agents that reason, plan, and interact with complex UIs! 📩 Interested? Send your CV and a brief self-intro to qianhuiwu@microsoft.com! #Internship #AIAgents #UINavigation

English

336

47.8K

Keşfet

@MSFTResearch @zhaoywang_CS @LiJunnan0409 @i_Am_Snow_Flake @SimularAI @_akhaliq @CVPR @BoqingGo