
Jonathan Scholz
152 posts



We present "hybrid system" that supplements conventional automation with "learning" for task & safety-level adaptiveness Deployed in factory for motor cable soldering (< 0.6 mm tolerance), resulting 108 motors, 99.4% SR with < 20 min data per task Paper: arxiv.org/abs/2604.22235


Today marks the end of my first full week @GeneralistAI Last Monday, I was given a challenge: use our GEN-1 model to teach a robot a task of my choosing, using the same no-code platform our customers use. I picked the ball-and-vase magic trick. It was one of my favorites as a kid, and it felt like the right mix of fun and surprisingly hard. A few days later, GEN-1 pulled it off. I left Friday having watched the robot nail it 14 times in a row. What’s wild is that even 4 months ago, if you told me you could go from idea to on-robot skill in a couple of days, I probably wouldn’t have believed you. Really excited to be building with an incredible team. Can’t wait to see what week two brings 🤖

Anthropic has announced that it is massively expanding its London presence. It’s just secured a new office for 800 people - a huge jump from its 200 current employees. OpenAI announced its first permanent office in London this week and now @AnthropicAI is doubling down. Meta, OpenAI, DeepMind, wayve and so many others have huge offices in London. It’s becoming the leading AI hub outside of the US. LETS GO

1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities. I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵


The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with. And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far: - We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots. - CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases. - CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper. - CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning. - CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap. 3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs. Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real! Link in thread:

Agency is usually formalized as utility maximization. But must it be? LLMs suggest a different foundation: intelligence as acquiring behavioral schemas from interaction structure. My new paper: "Universal AI as Imitation" investigates the limit-case of LLM-style models.



Maybe the bitter lesson for robotics that goes against all the current narratives is that a small amount of in-domain demos is really hard to beat, especially when it comes to e.g. industrial use cases, where variability is limited.


i was visiting a hackathon where 80+ participants were training pi0/0.5, gr00t, smolvla, ACT, DP, etc. on lerobot arms the best and most sample efficient policies were trained *from scratch* we still do not have an open source x-embodied GPT-2, but i'm hopeful for this year


Cleaned up the code and added simulation flag to the cli entrypoint 🤖🤖🤖


Robot policies fail on the hard parts of manipulation. The moment contact, friction, or force uncertainty shows up, the success rate drops fast. CR DAgger shows a very different path. You take a pre trained policy. You let a human correct it in the real world for a short time. And the system uses those corrections to learn a force aware residual policy that fixes the failure cases. The surprising part is the sample count❗️ They report improvements to near perfect performance on contact heavy tasks with only 50 to 100 episodes of correction. Key ideas in simple terms: ✓ A compliant interface so humans can give tiny corrections without stopping the robot ✓ A residual policy that adds the missing force reasoning to a position only base policy ✓ A fast update loop that makes the policy better with every bit of feedback They tested tasks like book flipping and belt assembly. The gains are large. It beats finetuning and it beats retraining from scratch. Thanks for sharing, @YifanHou2!! 📍Paper: arxiv.org/abs/2506.16685 Extended version with more experiments: compliant-residual-dagger.github.io/files/CR_DAgge… Code: github.com/yifan-hou/cr-d… —- Weekly robotics and AI insights. Subscribe free: scalingdeep.tech

