Kyle Stachowicz

81 posts

Kyle Stachowicz

Kyle Stachowicz

@KyleStachowicz

Robot learning @berkeley_ai @physical_int

Berkeley, CA Katılım Ağustos 2018
279 Takip Edilen872 Takipçiler
saurabh
saurabh@saurabhtwq·
would love to follow more people who working in vlm, world models and robotics. any recs ?
English
25
3
134
10.3K
cts🌸
cts🌸@gf_256·
Claude and Codex max walk into a bar. That’s the end of the joke. Claude hit the usage limit
English
27
122
3.3K
97.8K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
Partial observability means that a robot policy - even with infinitely many demonstrations - will still be worse than the demonstrator. With MEM, we built a recipe to close this gap. Fantastic work led by @KarlPertsch @marceltornev @DannyDriess.
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
2
0
47
3K
Kyle Stachowicz retweetledi
Tian Gao
Tian Gao@TianGao_19·
Long-tail scenarios remain a major challenge for autonomous driving. Unusual events—like accidents or construction zones—are underrepresented in driving data, yet require semantic and commonsense reasoning grounded in control. We propose SteerVLA, a framework that uses VLM reasoning to steer a driving policy via grounded, fine-grained language instructions. Paper: arxiv.org/abs/2602.08440 Website: steervla.github.io
English
6
23
176
69K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@liu730chaoqi Have you found reconstruction fidelity to be a bottleneck at all? During FAST I had a tough time getting flat FSQ tokenizers precise enough for our more dexterous tasks...(see the scaling curves wrt tokens)
English
1
0
0
88
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@liu730chaoqi Hah...maybe we should have given it a bit more discussion in the original paper (e.g., the version where you transpose the ordering is unsurprisingly terrible)
English
1
0
1
85
Chaoqi Liu
Chaoqi Liu@liu730chaoqi·
Yes, I really like FAST’s approach to action ordering—though it only gets a brief mention in the paper. I see OAT as a small step toward better understanding this space. Hopefully, it encourages more careful and principled thinking around how we generate action chunks. 😀
Chaoqi Liu tweet media
Kyle Stachowicz@KyleStachowicz

@liu730chaoqi Great writeup! Seems to clean up one of my least favorite parts of FAST (variable-width tokens are awful for decoding, and I suspect for learning signal) while keeping the token ordering that makes it work in the first place! Looking forward to trying it out :)

English
2
1
16
2.9K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@liu730chaoqi Great writeup! Seems to clean up one of my least favorite parts of FAST (variable-width tokens are awful for decoding, and I suspect for learning signal) while keeping the token ordering that makes it work in the first place! Looking forward to trying it out :)
English
0
0
3
4.5K
Kyle Stachowicz retweetledi
Chris Paxton
Chris Paxton@chris_j_paxton·
It's interesting (and good) to sed tje Macdonalds worker as the "hard mode" of robot intelligence. Lots of people dont, i think, have a good picture of what work will be hard for robots and what won't. High mix, chaotic work like this is the dream but its hard to pull off
Chris Paxton tweet media
Packy McCormick@packyM

Spend an hour reading this weekend and I think you’ll know more about robotics than 99% of people, including some people who invest in robotics. notboring.co/p/robot-steps

English
3
2
17
4K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@alz_zyd_ I think it's actually held up fine? The divergence in the slide is from training off-policy; training on-policy (i.e. RL/reasoning) takes you out of this regime completely because it allows correcting past mistakes, and it's empirically what has made long-CoT responses viable.
English
0
0
2
87
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@ID_AA_Carmack I mean, if your code is bottlenecked on indexing operations rather than the billion-parameter matmuls you probably screwed up already, and going from i64 to u32 isn't going to fix it...
English
1
0
2
346
John Carmack
John Carmack@ID_AA_Carmack·
Pytorch made the right call standardizing on signed 64 bit indexes. I would probably still be rather pointlessly making case by case decisions to use int32 if it were an option. Some old habits linger.
English
35
11
513
64.8K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@arnie_hacker @drfeifei If you have a world model good enough to replace real-world evals you've probably solved robotics already. Imperfect world models probably give a passable metric, but I'm skeptical that just hill-climbing WM evals will lead to practical gains.
English
0
0
1
42
Arnie Ramesh
Arnie Ramesh@arnie_hacker·
Evals for robotics is so far behind Afaik it’s running your trained model on tasks you define BEHAVIOUR benchmark from @drfeifei is promising, but models trained on real-world data require testing with real-world observations Bullish on 3DGS-based simulators for this reason
Jim Fan@DrJimFan

Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild west of robotics. 3 lessons I learned in 2025. 1. Hardware is ahead of software, but hardware reliability severely limits software iteration speed. We've seen exquisite engineering arts like Optimus, e-Atlas, Figure, Neo, G1, etc. Our best AI has not squeezed all the juice out of these frontier hardware. The body is more capable than what the brain can command. Yet babysitting these robots demands an entire operation team. Unlike humans, robots don't heal from bruises. Overheating, broken motors, bizarre firmware issues haunt us daily. Mistakes are irreversible and unforgiving. My patience was the only thing that scaled. 2. Benchmarking is still an epic disaster in robotics. LLM normies thought MMLU & SWE-Bench are common sense. Hold your 🍺 for robotics. No one agrees on anything: hardware platform, task definition, scoring rubrics, simulator, or real world setups. Everyone is SOTA, by definition, on the benchmark they define on the fly for each news announcement. Everyone cherry-picks the nicest looking demo out of 100 retries. We gotta do better as a field in 2026 and stop treating reproducibility and scientific discipline as second-class citizens. 3. VLM-based VLA feels wrong. VLA stands for "vision-language-action" model and has been the dominant approach for robot brains. Recipe is simple: take a pretrained VLM checkpoint and graft an action module on top. But if you think about it, VLMs are hyper-optimized to hill-climb benchmarks like visual question answering. This implies two problems: (1) most parameters in VLMs are for language & knowledge, not for physics; (2) visual encoders are actively tuned to *discard* low-level details, because Q&A only requires high-level understanding. But minute details matter a lot for dexterity. There's no reason for VLA's performance to scale as VLM parameters scale. Pretraining is misaligned. Video world model seems to be a much better pretraining objective for robot policy. I'm betting big on it.

English
4
4
67
15.1K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@liangpan_t Tokenization can overfit too. This is more a property of the data - you can get multimodal behaviors when you train on much larger data. The paper looks at small-ish sim datasets, but they're right that the main benefit of diffusion is *not* that it picks up all of the modes.
English
1
0
4
289
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@chris_j_paxton This! I think there was a while where it was unclear how much pretraining really helps (e.g. some of the ablations in π0), but since the π0.5 release we've dialed in our pretraining recipe a lot and it's enabled some awesome results (π0.6, olympics tasks, human video transfer).
English
0
0
7
419
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@satpalsr @JieWang_ZJUI It’s hard to make a robust sensor that gives rich touch feedback, but a wrist camera gives you lots of the same local feedback.
English
1
0
2
57
Jie Wang
Jie Wang@JieWang_ZJUI·
Very cool emergent capability of human-robot co-training! But I have to point out: We haven’t got a free lunch of learning in the wild YouTube video. 1) It’s still human augmentation. 2) Secret of VLAs is always the wrist camera. 3) Teleoperators have to shape their hands like grippers. This restricts the flexibility and dexterity. How it works may be because VLA resolution is low enough (224/448), so hands “look like” grippers for near-sighted policy. Hand data is more like a trajectory planner for ideal visual behavior. Robot data is still essential to ground actions. The t-SNE embedding visualization is most exciting! I like how PI presents correlation of human and robot data here. Looking forward to seeing more contact-rich tasks, excited to see we went one step further with scaling and co-training.
Jie Wang tweet mediaJie Wang tweet mediaJie Wang tweet media
Physical Intelligence@physical_int

We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data! This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.

English
6
11
176
17.6K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
I got *soooo* excited when @simar_kareer showed me this figure for the first time a few months back. Better pretraining directly unlocks better transfer, and not just between robots - humans are just another embodiment. And there's lots more in the pipeline at π!
Physical Intelligence@physical_int

This also shows up in the representations learned by the model. We plot the model’s representations of human and robot images. As pre-training is scaled up, the representation of humans and robots become more aligned: to a scaled-up model, human videos "look" like robot demos.

English
2
7
146
21.6K
Kyle Stachowicz
Kyle Stachowicz@KyleStachowicz·
@KyleVedder $ mv experimental/kyle experimental/kyles $ mkdir experimental/kylev looking forward to having you on board soon!
English
1
0
4
332
Kyle Vedder
Kyle Vedder@KyleVedder·
Personal Update: After a year at Dyna Robotics I'm joining Physical Intelligence as a Researcher! I'm very proud of my contributions at Dyna, including the DYNA-1 Reward Model and continuous demos. I'm excited to start a new chapter focused on building *self improving* systems for robots -- I think this is the most important challenge in robot learning, and it's still completely unsolved. PS: I'm also going to be at NeurIPS, so come say hi!
Kyle Vedder tweet media
English
21
4
311
24.3K