
Yash Jangir
25 posts

Yash Jangir
@off_jangir
Robotics @CarnegieMellon @CMU_robotics with @katerinafragiad and @ybisk | Noobie at Twitter






For a long time, I was skeptical about action-conditioned video prediction for robotics. Many models look impressive, but once you ask them to handle long-horizon manipulation with real physical interaction, things quickly fall apart (e.g., Genie is amazing but mostly focused on navigation). This project changed my mind. I'm beyond excited to share Interactive World Simulator, a project we have been working on for the past ~1.5 years 🤖 One of the first world models that produces convincing results for long-horizon robotic manipulation involving complex physical interactions, across a diverse range of objects (rigid objects, deformables, ropes, object piles). It directly unlocks scalable data generation for robotic policy training and policy evaluation. Try it yourself (no installation needed): yixuanwang.me/interactive_wo… Play directly with the simulator in your browser. Key Takeaways: 1️⃣ 15 Hz long-horizon action-conditioned video prediction for 10+ minutes on a single RTX 4090 GPU 2️⃣ Visual and dynamic fidelity: people often ask how much sim data equals one real data point. In our experiments, it turns out to be close to one-to-one using the Interactive World Simulator 3️⃣ Stress testing matters: we emphasize interactive stress testing to understand robustness and stability and to build trust in the simulator 4️⃣ The model is trained with only ~6 hours of real-world random interaction data on a single GPU. Imagine what happens if we scale this 1000× or even 1M× Huge credit to @YXWangBot, who led this effort with countless hours of work on data collection, training recipes, and system design. I'm incredibly proud of the work he did here! Enjoy the demos and videos. We also fully open-sourced the codebase for anyone interested in applying this to their own tasks. #Robotics #RobotLearning #WorldModels #EmbodiedAI




Why does manipulation lag so far behind locomotion? New post on one piece we don't talk about enough: The gearbox. The Gap You've probably seen those dancing humanoid robots from Chinese New Year. Locomotion isn't entirely solved; but clearly it's on a trajectory. But we haven't seen anything close for manipulation. 𝗪𝗵𝘆? When sim-to-real transfer fails, the instinct is to blame the algorithm. Train bigger networks. Crank up domain randomization. Those approaches have made real progress; we don't deny that. But we started wondering: are we treating the symptom or the disease? The Hardware Bottleneck: Fingers are too small for powerful motors. So most hands use massive gearboxes (200:1, 288:1) to get enough torque. But those gearboxes break everything manipulation needs: • Stiction and backlash are complex to simulate. Policies trained on smooth physics hallucinate when they hit that reality. • Reflected inertia scales as N². At large gear ratio, the finger hits with sledgehammer momentum. • Friction blocks force information. The hand becomes blind. And they're the first thing to break. What we are trying to build at Origami, we cut the gear ratio from 288:1 to 15:1 using axial flux motors and thermal optimization. The transmission becomes more transparent: backdrivable, low friction, forces propagate to motor current. Early signs are encouraging. Still running quantitative benchmarks. Why Interactive? I love how Science Center uses interactive devices to explain complex ideas. I want to borrow this concept and help people understand the hard problems in robotics better visually. The post has demos where you can toggle friction, slide gear ratios, watch the sim-to-real gap widen in real-time. What's inside: • Interactive demos (friction curves, N² scaling, contact patterns) • Comparison table: 14 robot hands by sim-to-real gap and force transparency • The math behind why low-ratio matters Read it here: origami-robotics.com/blog/dexterity… We're not claiming we've solved dexterity. The deadlock has many pieces. But we think this one's foundational. Curious what you think.



Enough of this nonsense! Presenting PARAM: India's most powerful indigenous robot dog. Not assembled, not bought, BUILT IN INDIA, built by INDIANS. For our nation, for our century, for our world! Jai Hind! 🇮🇳 @narendramodi @adgpi @AshwiniVaishnaw @GoI_MeitY @startupindia


🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n









