Tao Yu ✈️ NeurIPS 2025

483 posts

Tao Yu ✈️ NeurIPS 2025

@taoyds

@XLangNLP lab, asst. prof. @HKUniversity. author of OpenCUA, OSWorld, Aguvis, Spider, OpenAgents, Text2Reward, Instructor.

Seattle Katılım Mart 2016

888 Takip Edilen5.7K Takipçiler

Tao Yu ✈️ NeurIPS 2025 retweetledi

Claude@claudeai·2d

You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.

English

4.9K

14.6K

139.2K

74.5M

Tao Yu ✈️ NeurIPS 2025 retweetledi

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

738

13.6K

6.5M

Tao Yu ✈️ NeurIPS 2025 retweetledi

Yiheng Xu@yihengxu_·5 Şub

GPT-5.3-Codex is released! Coding, APIs, and computer use are the three most important interfaces for agents. Codex are clearly great at the first two. Now there is a big leap for CUA! It can unlock many things that only humans could do before, and still iterating fast. Excited for the next one!

Sam Altman@sama

GPT-5.3-Codex is here! *Best coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld). *Mid-task steerability and live updates during tasks. *Faster! Less than half the tokens of 5.2-Codex for same tasks, and >25% faster per token! *Good computer use.

English

183

11.3K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Xin Eric Wang@xwang_lk·4 Şub

𝐑𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐢𝐬 𝐭𝐡𝐞 𝐟𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤 𝐟𝐨𝐫 𝐆𝐔𝐈 𝐚𝐠𝐞𝐧𝐭𝐬.⚠️ One wrong click can trigger irreversible, costly actions 💥 Introducing 𝐒𝐚𝐟𝐞𝐆𝐫𝐨𝐮𝐧𝐝🛡️: an uncertainty-calibrated framework that knows when not to act, enabling risk-aware GUI grounding with statistical guarantees 📊 𝐊𝐞𝐲 𝐢𝐝𝐞𝐚: the real danger is 𝐬𝐢𝐥𝐞𝐧𝐭 𝐟𝐚𝐢𝐥𝐮𝐫𝐞 🤫 Most GUI grounding models always output a coordinate, even when they’re unsure ❌📍 Instead, SafeGround: 📐 𝘌𝘴𝘵𝘪𝘮𝘢𝘵𝘦𝘴 𝘴𝘱𝘢𝘵𝘪𝘢𝘭 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 𝘧𝘳𝘰𝘮 𝘱𝘳𝘦𝘥𝘪𝘤𝘵𝘪𝘰𝘯 𝘷𝘢𝘳𝘪𝘢𝘣𝘪𝘭𝘪𝘵𝘺; 🎯 𝘊𝘢𝘭𝘪𝘣𝘳𝘢𝘵𝘦𝘴 𝘢 𝘥𝘦𝘤𝘪𝘴𝘪𝘰𝘯 𝘵𝘩𝘳𝘦𝘴𝘩𝘰𝘭𝘥 𝘸𝘪𝘵𝘩 𝘴𝘵𝘢𝘵𝘪𝘴𝘵𝘪𝘤𝘢𝘭 𝘨𝘶𝘢𝘳𝘢𝘯𝘵𝘦𝘦𝘴; 🛑 𝘈𝘣𝘴𝘵𝘢𝘪𝘯𝘴 𝘰𝘳 𝘥𝘦𝘧𝘦𝘳𝘴 𝘩𝘪𝘨𝘩-𝘳𝘪𝘴𝘬 𝘢𝘤𝘵𝘪𝘰𝘯𝘴, 𝘦𝘯𝘢𝘣𝘭𝘪𝘯𝘨 𝘳𝘪𝘴𝘬-𝘤𝘰𝘯𝘵𝘳𝘰𝘭𝘭𝘦𝘥 𝘎𝘜𝘐 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯, 𝘦𝘷𝘦𝘯 𝘧𝘰𝘳 𝘣𝘭𝘢𝘤𝘬-𝘣𝘰𝘹 𝘮𝘰𝘥𝘦𝘭𝘴.🔒🤖

Qingni Wang@Ceeqnn

🚨 New paper alert 🚨 📌 How can we make GUI grounding models reliable in real-world interactions? We introduce 🚀 SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration In GUI agents, a single wrong click isn’t just an error — it can trigger costly or irreversible actions (e.g., unintended payments 💸 or deleting important files 🗑️). The real danger is silent failure: most GUI grounding models always output a coordinate, even when they’re unsure. Instead of trusting a single predicted point, SafeGround: • estimates spatial uncertainty from prediction variability • calibrates a decision threshold with statistical guarantees • enables risk-controlled GUI actions, even with black-box models 💻 Code: github.com/Cece1031/SAFEG… 📄 Paper: arxiv.org/pdf/2602.02419 🧵1/6 #Agents #GUI

English

5.1K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Kimi.ai@Kimi_Moonshot·30 Oca

Kimi K2.5: Now Top 1 on the OSWorld leaderboard. 🏆 With its Computer Use capabilities, you can now build powerful agents that navigate and operate computer interface just like a human. os-world.github.io

English

729

50.4K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Kimi.ai@Kimi_Moonshot·27 Oca

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on kimi.com in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blogs/kimi-k2-… 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

783

16K

7.3M

Tao Yu ✈️ NeurIPS 2025 retweetledi

Zhoujun (Jorge) Cheng@ChengZhoujun·20 Oca

Pretraining has scaling laws to guide compute allocation. But for RL on LLMs, we lack a practical guide on how to spend compute wisely. We show the optimal compute allocation in LLM RL scales predictably. ↓ Key takeaways below

GIF

English

100

443

67.6K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Yiheng Xu@yihengxu_·11 Ara

Thrilled to share I’ve joined @OpenAI to work on agents research! GPT-5.2 just launched and it’s a big upgrade. Super excited to build what’s next and push our agentic capabilities further. More to come!

OpenAI@OpenAI

GPT-5.2 is now rolling out to everyone. openai.com/index/introduc…

English

661

72.4K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Shizhe Diao@shizhediao·27 Kas

🚀 Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyone’s building agent workflows these days — connecting tools, APIs, and LLMs like LEGO. 🧩 But here are our findings: 👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent. 👉 Without learning, workflows plateau fast. It’s time to bring RL fine-tuning 🔥back into agent development. (1/n)

English

348

67.1K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Qwen@Alibaba_Qwen·27 Kas

🏆 We are incredibly honored to announce that our paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" has received the NeurIPS 2025 Best Paper Award! A huge congratulations to our dedicated research team for pushing the boundaries of AI. Read more: blog.neurips.cc/2025/11/26/ann…

English

384

2.9K

482.9K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Yu Su@ysu_nlp·25 Kas

Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two traits define whether agents become critical infrastructure or remain clever demos. Plastic systems like to change. Reliable systems resist change. Is it even possible to have both of these seemingly conflicting traits? Fortunately, humans are a living example of that. We are constantly learning and adapting while staying remarkably dependable (for the most part, at least). The real question is, how can we achieve the same harmony within a different cognitive substrate? We've brought together some of the world's best agent experts whose work (Mind2Web, MMMU, LLM-Planner, SeeAct, UGround) helped shape the modern agent field. Now we are taking on the new mission: unlocking plasticity and reliability for every agent. We are looking for cracked researchers and engineers to join us in person in the bay area! If you strongly resonate with the mission, send your CV and thoughts to: hiring@neocognition.io I will be at #neurips2025. Happy to chat over coffee!

English

442

83.5K

Tao Yu ✈️ NeurIPS 2025 retweetledi

AI at Meta@AIatMeta·19 Kas

Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: go.meta.me/591040 2️⃣ SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image. 🔗 Learn more about SAM 3D: go.meta.me/305985 These models offer innovative capabilities and unique tools for developers and researchers to create, experiment and uplevel media workflows.

English

140

601

3.6K

1.1M

Tao Yu ✈️ NeurIPS 2025 retweetledi

Claude@claudeai·24 Kas

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

English

1.1K

2.5K

19.3K

7.8M

Tao Yu ✈️ NeurIPS 2025@taoyds·14 Kas

@sivareddyg Big congrats, Siva!!!

English

183

Siva Reddy@sivareddyg·14 Kas

Honored to receive the Computer Science Canada Outstanding Early Career Researcher award 🏅. It is a recognition of the work carried out by my students for their courage to push fundamental ideas in natural language processing even in the era of LLMs. Thanks to my mentors and nominators for making time in their incredibly busy schedule. And thanks to my colleagues at Mila, McGill and ServiceNow for fostering an intellectually stimulating environment and providing resources to succeed!

Mila - Institut québécois d'IA@Mila_Quebec

Congratulations to Siva Reddy (@sivareddyg), Core Academic Member at Mila, who has received the prestigious Outstanding Early Career Computer Science Researcher Award from @CSCan_InfoCan , the leading organization for the computer science community in Canada. mila.quebec/en/news/siva-r…

English

123

13.8K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Zhiyuan Zeng@ZhiyuanZeng_·11 Kas

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵 [1/n]

English

112

487

165.9K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Victor Zhong@hllo_wrld·3 Kas

I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at r2llab.com/opening or email me directly!

English

361

37K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Sida Wang@sidawxyz·23 Eki

I have one PhD intern opening to do research as a part of a model training effort at the FAIR CodeGen team (latest: Code World Model). If interested, email me directly and apply at metacareers.com/jobs/214557081…

English

235

30.1K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Xinyuan Wang@xywang626·1 Eki

Big update for OpenCUA! OpenCUA-72B-preview now ranks #1 on the OSWorld-Verified leaderboard (os-world.github.io). It is a pure GUI action, end-to-end computer-use foundation model (Website: opencua.xlang.ai). Huge thanks to the effort of OpenCUA team and the great support of Kimi Team @Kimi_Moonshot ! Claude 4.5 is extremely strong on OSWorld, but we’re committed to pushing open-source, end-to-end CUA foundation models forward. Over the last month we trained a larger, stronger model: 45.0% average on OSWorld-Verified. It also shows strong GUI grounding ability: 37.3% on UI-Vision @EdwardJian2 @PShravannayak and 60.8% on ScreenSpot-Pro. We’ll keep driving open-source CUA: models will be on HuggingFace very soon, and a paper update is on the way. #OpenSource #Agents #OSWorld #CUA #ComputerUseAgent

Xinyuan Wang@xywang626

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] arxiv.org/abs/2508.09123 📌 [Website] opencua.xlang.ai 🤖 [Models] huggingface.co/xlangai/OpenCU… 📊[Data] huggingface.co/datasets/xlang… 💻 [Code] github.com/xlang-ai/OpenC… 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

English

104

14.4K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Yanzhe Zhang@StevenyzZhang·27 Ağu

Introducing Generative Interfaces - a new paradigm beyond chatbots. We generate interfaces on the fly to better facilitate LLM interaction, so no more passive reading of long text blocks. Adaptive and Interactive: creates the form that best adapts to your goals and needs!

English

151

59.8K

Tao Yu ✈️ NeurIPS 2025 retweetledi

Yi Ma@YiMaTweets·26 Ağu

HKU School of Computing is open to recruit PhD students. Deadline for the early round is August 31, approaching fast. Please inform your friends!

English

5.9K

Keşfet

@OpenAI @sivareddyg @Kimi_Moonshot @EdwardJian2 @PShravannayak @elonmusk @BarackObama @taylorswift13