

Börje Karlsson
5.1K posts

@tellarin
AI Researcher @BAAIBeijing, ex-@MSFTResearch Asia, @nokia INdT, @inovacao_cesar. Occasional politics, opinions are my own, RTs ≠ endorsements… 🇸🇪🇧🇷🇵🇹🇺🇦



Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake. Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence. ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones. A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning. /goal: we all take a holiday and Jensen wouldn't even notice ;) We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:

our preprint is out! we attempt to model human teaching behaviors into agents yielding a unified framework that enables adaptive personalized learning experiences from end-to-end: 🧵


Introducing WebHarbor ⚓ — an open community effort to dock real websites into local, deterministic, and evolving environments for web agent research. 🌐 Come help us build it. 🤝 Contribute new web environments or fix existing ones — will be included in the author list! ✍️ 🎉 First release: 15 multimodal, high-fidelity environments covering all 643 WebVoyager tasks — full frontend, backend, database, and auth, all in one lightweight Docker image. Why? Web agent eval today is broken😦: reCAPTCHA, geo-blocks, content drift, flaky networks, and login-gated deep features (e.g., account and checkout) that benchmarks can't touch. Live sites can't be reset either — making online agent RL impractical. Again, the bottleneck isn't the agent. It's the environment. WebHarbor: dock real websites into stable, reproducible local mirrors with sub-second reset. But here's the key 🌱 — you can't clone the entire web upfront, and you don't need to. WebHarbor evolves with the agent: as harder tasks arrive, environments grow to support them. Coding agents (e.g., Claude Code/CodeX) build mirrors fast; human reviewers catch what coding agent hacks (shortcuts, leaks, fake completions). We need you. 🙌 Help us scale to 100+ and beyond: 🔨 Contribute a new web environment 🐛 Fix or improve existing mirrors 🔍 Audit task fidelity & interaction realism See more details and join the effort: - 🏠 Project Page: aiming-lab.github.io/webharbor.gith… - 💻 GitHub Repo: github.com/aiming-lab/Web… - 📝 Contribution Form: forms.gle/ngcD1rzAfUEphN… Let's build the open-source environment infrastructure for GUI web agents! ⚓ Initiating institutions: UNC-Chapel Hill ✖️Microsoft #AIAgents #WebAgents #LLM #OpenSource #AgenticAI




I feel sorry for Claude Code I know they're not the one. I'm not overcommitting - not investing too hard I wonder if they know I'm pulling away






Generalization in embodied AI will only happen when it can make effective use of existing real-world infrastructure, which is built for humans, not as clean APIs. Success requires more than “commonsense”, but current models can't yet handle this critical real-world scenario.



Collecting high-quality GUI trajectories for agent training is expensive. But are we fully leveraging the open-source data we already have? 🤔 ✨Introducing GUI-Libra (gui-libra.github.io): 81K high-quality, action-aligned reasoning dataset curated from open-source corpora, plus a tailored training recipe that combines action-aware SFT with step-wise RLVR-style training (⚠️partially verifiable rather than fully verifiable!). Result: stronger native GUI agents on both offline step-wise evaluation and online environments across mobile and web domains. Take away: With careful data curation + tailored post-training recipe, a small subset of open-source trajectories can still go a long way for training native GUI agents. Check out our paper (arxiv.org/abs/2602.22190) and code/dataset/model (github.com/GUI-Libra/GUI-…) for more details. #GUI #agent #VLM

Generalization in embodied AI will only happen when it can make effective use of existing real-world infrastructure, which is built for humans, not as clean APIs. Success requires more than “commonsense”, but current models can't yet handle this critical real-world scenario.


