Tao Yu ✈️ NeurIPS 2025

483 posts

Tao Yu ✈️ NeurIPS 2025 banner
Tao Yu ✈️ NeurIPS 2025

Tao Yu ✈️ NeurIPS 2025

@taoyds

@XLangNLP lab, asst. prof. @HKUniversity. author of OpenCUA, OSWorld, Aguvis, Spider, OpenAgents, Text2Reward, Instructor.

Seattle Katılım Mart 2016
888 Takip Edilen5.7K Takipçiler
Tao Yu ✈️ NeurIPS 2025 retweetledi
Claude
Claude@claudeai·
You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.
English
4.9K
14.6K
139.2K
74.5M
Tao Yu ✈️ NeurIPS 2025 retweetledi
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
738
13.6K
6.5M
Tao Yu ✈️ NeurIPS 2025 retweetledi
Yiheng Xu
Yiheng Xu@yihengxu_·
GPT-5.3-Codex is released! Coding, APIs, and computer use are the three most important interfaces for agents. Codex are clearly great at the first two. Now there is a big leap for CUA! It can unlock many things that only humans could do before, and still iterating fast. Excited for the next one!
Yiheng Xu tweet media
Sam Altman@sama

GPT-5.3-Codex is here! *Best coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld). *Mid-task steerability and live updates during tasks. *Faster! Less than half the tokens of 5.2-Codex for same tasks, and >25% faster per token! *Good computer use.

English
13
7
183
11.3K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Xin Eric Wang
Xin Eric Wang@xwang_lk·
𝐑𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐢𝐬 𝐭𝐡𝐞 𝐟𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤 𝐟𝐨𝐫 𝐆𝐔𝐈 𝐚𝐠𝐞𝐧𝐭𝐬.⚠️ One wrong click can trigger irreversible, costly actions 💥 Introducing 𝐒𝐚𝐟𝐞𝐆𝐫𝐨𝐮𝐧𝐝🛡️: an uncertainty-calibrated framework that knows when not to act, enabling risk-aware GUI grounding with statistical guarantees 📊 𝐊𝐞𝐲 𝐢𝐝𝐞𝐚: the real danger is 𝐬𝐢𝐥𝐞𝐧𝐭 𝐟𝐚𝐢𝐥𝐮𝐫𝐞 🤫 Most GUI grounding models always output a coordinate, even when they’re unsure ❌📍 Instead, SafeGround: 📐 𝘌𝘴𝘵𝘪𝘮𝘢𝘵𝘦𝘴 𝘴𝘱𝘢𝘵𝘪𝘢𝘭 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 𝘧𝘳𝘰𝘮 𝘱𝘳𝘦𝘥𝘪𝘤𝘵𝘪𝘰𝘯 𝘷𝘢𝘳𝘪𝘢𝘣𝘪𝘭𝘪𝘵𝘺; 🎯 𝘊𝘢𝘭𝘪𝘣𝘳𝘢𝘵𝘦𝘴 𝘢 𝘥𝘦𝘤𝘪𝘴𝘪𝘰𝘯 𝘵𝘩𝘳𝘦𝘴𝘩𝘰𝘭𝘥 𝘸𝘪𝘵𝘩 𝘴𝘵𝘢𝘵𝘪𝘴𝘵𝘪𝘤𝘢𝘭 𝘨𝘶𝘢𝘳𝘢𝘯𝘵𝘦𝘦𝘴; 🛑 𝘈𝘣𝘴𝘵𝘢𝘪𝘯𝘴 𝘰𝘳 𝘥𝘦𝘧𝘦𝘳𝘴 𝘩𝘪𝘨𝘩-𝘳𝘪𝘴𝘬 𝘢𝘤𝘵𝘪𝘰𝘯𝘴, 𝘦𝘯𝘢𝘣𝘭𝘪𝘯𝘨 𝘳𝘪𝘴𝘬-𝘤𝘰𝘯𝘵𝘳𝘰𝘭𝘭𝘦𝘥 𝘎𝘜𝘐 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯, 𝘦𝘷𝘦𝘯 𝘧𝘰𝘳 𝘣𝘭𝘢𝘤𝘬-𝘣𝘰𝘹 𝘮𝘰𝘥𝘦𝘭𝘴.🔒🤖
Xin Eric Wang tweet media
Qingni Wang@Ceeqnn

🚨 New paper alert 🚨  📌 How can we make GUI grounding models reliable in real-world interactions?  We introduce 🚀 SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration In GUI agents, a single wrong click isn’t just an error — it can trigger costly or irreversible actions (e.g., unintended payments 💸 or deleting important files 🗑️).  The real danger is silent failure: most GUI grounding models always output a coordinate, even when they’re unsure.  Instead of trusting a single predicted point, SafeGround:  • estimates spatial uncertainty from prediction variability  • calibrates a decision threshold with statistical guarantees  • enables risk-controlled GUI actions, even with black-box models  💻 Code: github.com/Cece1031/SAFEG…  📄 Paper: arxiv.org/pdf/2602.02419 🧵1/6 #Agents #GUI

English
3
5
28
5.1K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Kimi K2.5: Now Top 1 on the OSWorld leaderboard. 🏆 With its Computer Use capabilities, you can now build powerful agents that navigate and operate computer interface just like a human. os-world.github.io
Kimi.ai tweet media
English
32
59
729
50.4K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on kimi.com in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blogs/kimi-k2-… 🔗 Weights & code: huggingface.co/moonshotai/Kim…
Kimi.ai tweet media
English
783
2K
16K
7.3M
Tao Yu ✈️ NeurIPS 2025 retweetledi
Zhoujun (Jorge) Cheng
Zhoujun (Jorge) Cheng@ChengZhoujun·
Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below
GIF
English
18
100
443
67.6K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
🚀 Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyone’s building agent workflows these days — connecting tools, APIs, and LLMs like LEGO. 🧩 But here are our findings: 👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent. 👉 Without learning, workflows plateau fast. It’s time to bring RL fine-tuning 🔥back into agent development. (1/n)
Shizhe Diao tweet media
English
29
69
348
67.1K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Qwen
Qwen@Alibaba_Qwen·
🏆 We are incredibly honored to announce that our paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" has received the NeurIPS 2025 Best Paper Award! A huge congratulations to our dedicated research team for pushing the boundaries of AI. Read more: blog.neurips.cc/2025/11/26/ann…
Qwen tweet media
English
91
384
2.9K
482.9K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Yu Su
Yu Su@ysu_nlp·
Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two traits define whether agents become critical infrastructure or remain clever demos. Plastic systems like to change. Reliable systems resist change. Is it even possible to have both of these seemingly conflicting traits? Fortunately, humans are a living example of that. We are constantly learning and adapting while staying remarkably dependable (for the most part, at least). The real question is, how can we achieve the same harmony within a different cognitive substrate? We've brought together some of the world's best agent experts whose work (Mind2Web, MMMU, LLM-Planner, SeeAct, UGround) helped shape the modern agent field. Now we are taking on the new mission: unlocking plasticity and reliability for every agent. We are looking for cracked researchers and engineers to join us in person in the bay area! If you strongly resonate with the mission, send your CV and thoughts to: hiring@neocognition.io I will be at #neurips2025. Happy to chat over coffee!
Yu Su tweet media
English
41
44
442
83.5K
Tao Yu ✈️ NeurIPS 2025 retweetledi
AI at Meta
AI at Meta@AIatMeta·
Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: go.meta.me/591040 2️⃣ SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image. 🔗 Learn more about SAM 3D: go.meta.me/305985 These models offer innovative capabilities and unique tools for developers and researchers to create, experiment and uplevel media workflows.
English
140
601
3.6K
1.1M
Tao Yu ✈️ NeurIPS 2025 retweetledi
Claude
Claude@claudeai·
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
Claude tweet media
English
1.1K
2.5K
19.3K
7.8M
Siva Reddy
Siva Reddy@sivareddyg·
Honored to receive the Computer Science Canada Outstanding Early Career Researcher award 🏅. It is a recognition of the work carried out by my students for their courage to push fundamental ideas in natural language processing even in the era of LLMs. Thanks to my mentors and nominators for making time in their incredibly busy schedule. And thanks to my colleagues at Mila, McGill and ServiceNow for fostering an intellectually stimulating environment and providing resources to succeed!
Mila - Institut québécois d'IA@Mila_Quebec

Congratulations to Siva Reddy (@sivareddyg), Core Academic Member at Mila, who has received the prestigious Outstanding Early Career Computer Science Researcher Award from @CSCan_InfoCan , the leading organization for the computer science community in Canada. mila.quebec/en/news/siva-r…

English
14
11
123
13.8K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Zhiyuan Zeng
Zhiyuan Zeng@ZhiyuanZeng_·
RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵 [1/n]
Zhiyuan Zeng tweet media
English
13
112
487
165.9K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Victor Zhong
Victor Zhong@hllo_wrld·
I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at r2llab.com/opening or email me directly!
English
5
80
361
37K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Sida Wang
Sida Wang@sidawxyz·
I have one PhD intern opening to do research as a part of a model training effort at the FAIR CodeGen team (latest: Code World Model). If interested, email me directly and apply at metacareers.com/jobs/214557081…
English
7
27
235
30.1K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Xinyuan Wang
Xinyuan Wang@xywang626·
Big update for OpenCUA! OpenCUA-72B-preview now ranks #1 on the OSWorld-Verified leaderboard (os-world.github.io). It is a pure GUI action, end-to-end computer-use foundation model (Website: opencua.xlang.ai). Huge thanks to the effort of OpenCUA team and the great support of Kimi Team @Kimi_Moonshot ! Claude 4.5 is extremely strong on OSWorld, but we’re committed to pushing open-source, end-to-end CUA foundation models forward. Over the last month we trained a larger, stronger model: 45.0% average on OSWorld-Verified. It also shows strong GUI grounding ability: 37.3% on UI-Vision @EdwardJian2 @PShravannayak and 60.8% on ScreenSpot-Pro. We’ll keep driving open-source CUA: models will be on HuggingFace very soon, and a paper update is on the way. #OpenSource #Agents #OSWorld #CUA #ComputerUseAgent
Xinyuan Wang tweet media
Xinyuan Wang@xywang626

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] arxiv.org/abs/2508.09123 📌 [Website] opencua.xlang.ai 🤖 [Models] huggingface.co/xlangai/OpenCU… 📊[Data] huggingface.co/datasets/xlang… 💻 [Code] github.com/xlang-ai/OpenC… 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

English
8
24
104
14.4K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Yanzhe Zhang
Yanzhe Zhang@StevenyzZhang·
Introducing Generative Interfaces - a new paradigm beyond chatbots. We generate interfaces on the fly to better facilitate LLM interaction, so no more passive reading of long text blocks. Adaptive and Interactive: creates the form that best adapts to your goals and needs!
English
5
36
151
59.8K
Tao Yu ✈️ NeurIPS 2025 retweetledi
Yi Ma
Yi Ma@YiMaTweets·
HKU School of Computing is open to recruit PhD students. Deadline for the early round is August 31, approaching fast. Please inform your friends!
Yi Ma tweet media
English
1
3
28
5.9K