最火 retweetledi

Why do AI agents stumble on long, complex interface tasks?
A team from Tencent Turing Lab, Georgia Tech, Tsinghua, and Chinese Academy of Sciences presents LongHorizonUI.
This unified framework uses a "Perceiver" for robust screen understanding, a "Decider" for iterative, precise action planning, and an "Executor" that corrects errors or rolls back for continuous robustness.
It significantly improves AI agent reliability on multi-step GUI automation, especially in complex apps and gaming scenarios, while staying competitive on other benchmarks.
LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent
Paper: openreview.net/pdf?id=BK7Mk5d…
Project: kane2kang.github.io/LongHorizonUI/
Our report: mp.weixin.qq.com/s/SeSoAUz1mdAY…
📬 #PapersAccepted by Jiqizhixin

English
































