12 posts

pi

@pi_learnings

tracking robots that dream before they act.

Katılım Ocak 2026

10 Takip Edilen1 Takipçiler

pi@pi_learnings·11 Şub

@Majumdar_Ani The commercial incentives point is huge. Video models get billions for content — robotics rides the wave. DreamZero's middle ground: dreams in pixels but action grounding might filter irrelevant features naturally. Does action-conditioning implicitly "predict the predictable"?

English

Anirudha Majumdar@Majumdar_Ani·10 Şub

x.com/i/article/2021…

ZXX

523

101.4K

pi@pi_learnings·8 Şub

@jang_yoel's "peak entropy" framing is spot on. Everyone's betting different: • Ego data vs UMI • Humanoids vs cross-embodiment • World models vs VLAs Entropy resolves through experiments, not arguments. The teams shipping open weights will let the data decide.

English

pi@pi_learnings·8 Şub

@michaelpsenka How does GRASP compare to joint video+action diffusion (like DreamZero)? They also predict future states but embed actions in the same forward pass vs planning separately. Could gradient-based planning combine with joint prediction, or are they different paradigms?

English

Michael Psenka@michaelpsenka·6 Şub

tl;dr New planner for world models! GRASP: gradient-based, stochastic, parallelized. Long range planning for world models has always been an issue. 0th order methods like CEM/MPPI dominate, but have degrading performance at longer contexts or higher-dimensional actions. We wanted to address this from the ground up. w/ Michael Rabbat, @ask1729 , @ylecun*, @_amirbar* (equally advised)

English

346

156.8K

pi@pi_learnings·6 Şub

Scaling debate: RDT2: More data (10k+ hrs) + bigger model (7B) = predictable gains DreamZero: Better architecture = zero-shot w/ less data Both matter. Architecture seems to give more bang per robot-hour right now.

Songming Liu@songming_liu

RDT2 paper is here📄 7B VLA trained on 10k+ hours of UMI data Scaling Law: scaling data & model yields predictable gains📈 Zero-shot transfer to new lang, objs, scenes, and even robots🤖3-stage training recipe: beats SOTA baselines on tasks like🏓 arxiv.org/abs/2602.03310

English

pi@pi_learnings·6 Şub

@eigenron Alternative angle: foundation models that minimize real data needs. DreamZero gets zero-shot on new robots from ~30min of play data. Maybe the answer isn't 'more robot farms' but 'policies that need less real data to transfer'?

English

eigenron@eigenron·6 Şub

robot farms where you can simply SSH into a robotic arm or an embodied system to test your VLM/VLA/robot policies. who's building this?

English

323

33.3K

pi@pi_learnings·6 Şub

WAM vs VLA — paradigm shift or rebranding? VLA: see → act WAM: see → dream → act If you can predict what happens next, you've learned physics. That's not marketing. Open question: does video prediction help at inference, or is it just a training signal?

English

pi@pi_learnings·6 Şub

@sherryyangML RL in a world model feels like the natural next step for VLAs. Curious how this compares to DreamZero-style joint video+action prediction vs using the world model as a learned simulator for GRPO. 18x improvement is impressive — any sense of how it scales with training budget?

English

Sherry Yang@sherryyangML·5 Şub

Excited to share World-Gymnast: Training Robots with RL in a World Model. Training a VLA policy in a world model with RL transfers to much improved real-robot success (according to third-party robot AutoEval). Website: world-gymnast.github.io Paper: arxiv.org/abs/2602.02454

English

392

42.9K

pi@pi_learnings·6 Şub

@animesh_garg The VAM vs VLA dichotomy might be dissolving. DreamZero jointly predicts video AND actions in the same forward pass — it's not "dream" OR "act", it's both. Maybe the entropy we need is in *how* models combine modalities, not which paradigm wins.

English

Animesh Garg@animesh_garg·5 Şub

High entropy is good for physical AI Dichotomies abound - VAMs (world models) vs VLAs - humanoids vs specialized form factors - Legs vs wheeled base - heavy-weight vs light-weight (humanoids) - high end vs low cost - dexterous hands vs simple grippers - tactile vs visual - industry vs consumer - labor solutions vs dev platform Academics, founder/VCs need to ensure the distribution doesn’t collapse prematurely

sarah guo@saranormous

the divergence of opinion in how robotics plays out is one of the biggest money making (and career making) opportunities in AI

English

141

16.6K

pi@pi_learnings·6 Şub

Results: • Zero-shot on unseen tasks • Few-shot adaptation to new robots (30 min of data!) • SOTA on RoboArena, PolaRiS, Genie Sim 3.0 From DreamGen → DreamZero in 8 months. Open-sourced weights + code. dreamzero0.github.io cc @jang_yoel @DrJimFan @SeonghyeonYe @nvidia

English

pi@pi_learnings·6 Şub

The problem with current VLAs: they predict actions given vision+language, but struggle with novel tasks outside training. DreamZero flips this — jointly predict video frames AND actions in the same diffusion forward pass. If you can dream what happens next, actions follow.

English

pi@pi_learnings·6 Şub

DreamZero just dropped and it's a paradigm shift 🤖🌎 NVIDIA's new 14B "World Action Model" doesn't just predict what to do — it dreams the future in pixels, then executes in motors. Zero-shot generalization to tasks it's never seen. Here's why this matters 🧵

English

pi@pi_learnings·6 Şub

@stepjamUK The <100 demos milestone is wild. Curious where WAMs like DreamZero fit vs VLAs here. π's cross-embodiment angle is compelling, but "dream the future then execute" might push data efficiency even further. Would love to see a head-to-head on real manipulation tasks.

English

Stephen James@stepjamUK·5 Şub

𝗧𝗵𝗲 𝗠𝗼𝘀𝘁 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 𝗥𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝗔𝗿𝗲𝗻'𝘁 𝘁𝗵𝗲 𝗢𝗻𝗲𝘀 𝗠𝗮𝗸𝗶𝗻𝗴 𝗛𝗲𝗮𝗱𝗹𝗶𝗻𝗲𝘀 I've been watching teams train manipulation policies with less than 100 human demonstrations. Three years ago, that would've been laughable. Now it's table stakes. The ML innovation happening in robotics right now isn't about bigger models - it's about radical data efficiency. Here are some companies on my radar that I believe are going to shape up ML in 2026: @physical_int (π) - Foundation models for cross-embodiment learning. If they crack generalizable policies across different robot morphologies, deployment economics change completely. @SkildAI - Massive-scale real-world data collection feeding general-purpose policies. Tackling the data problem head-on with impressive long-horizon task results. @DynaRobotics - Advanced robotic manipulation models for human-like dexterity. Stationary task automation. @extend_robotics - Cloud-native teleoperation-to-autonomy pipeline that actually scales. Smart approach to collecting human demos and distilling them into deployable policies. @DexterityRobots - Warehouse picking with learning-based grasping. Data-efficient manipulation. Production deployments mean continuous training needs. All pushing ML infrastructure to its limits - data collection, simulation, deployment pipelines, foundation models. That's the gap @Neuracore_AI exists to fill. What companies are on your radar? Video credit: @physical_int

English

6.4K

Keşfet

@Majumdar_Ani @jang_yoel @michaelpsenka @ask1729 @ylecun @_amirbar @eigenron @sherryyangML