Nick Stracke

54 posts

Nick Stracke

Nick Stracke

@rmsnorm

PhD Student at Ommer Lab (Stable Diffusion) Trying to understand worlds and motion...

Munich Katılım Temmuz 2024
248 Takip Edilen188 Takipçiler
Sabitlenmiş Tweet
Nick Stracke
Nick Stracke@rmsnorm·
Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇
English
9
47
304
38.6K
Nick Stracke
Nick Stracke@rmsnorm·
@nurvai_ai You need periodic regrounding, and that's also what we do for LIBERO. You usually also have a translation error from converting tracks into actions that a robot can actually execute, which you also have to compensate for.
English
0
0
0
38
Nurvai - The Data Layer for Physical AI
Really interesting idea. Compressing motion instead of pixels feels like a much more natural representation for planning. Curious how well these motion embeddings hold up in long-horizon tasks where small errors compound. Do they remain stable, or do you still need periodic re-grounding from observations?
English
1
0
2
283
Nick Stracke
Nick Stracke@rmsnorm·
Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇
English
9
47
304
38.6K
Nick Stracke retweetledi
Miguel Angel Bautista
Miguel Angel Bautista@itsbautistam·
Amazing work led by @rmsnorm @KoljaBauer and our collaborators at LMU, to be presented at @CVPR! Personally, I find this question of "what's the right level of abstraction for planning in physical space?" to be very intriguing. Pixels over time are very low SNR (ie. the argument behind JEPA) but motion/trajectories carries a lot on information while being extremely compressible. I believe there's a lot more to uncover from this direction. Very glad to be part of this one!
Nick Stracke@rmsnorm

Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇

English
1
3
10
1.3K
Nick Stracke
Nick Stracke@rmsnorm·
@Frid45 Thanks! 🇪🇺 Supporting European research, I see 👀
English
0
0
4
357
Nick Stracke retweetledi
Kolja Bauer
Kolja Bauer@KoljaBauer·
Do we really need pixel generation to model motion? 🤔 We show how directly representing motion in a compact space enables efficient, scalable planning. 10,000× faster than video models, enabling planning and reasoning in open-world and robotics settings. Check it out ⬇️
Nick Stracke@rmsnorm

Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇

English
1
4
29
3.4K
Nick Stracke retweetledi
Stefan Baumann
Stefan Baumann@StefanABaumann·
You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step. We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models. Myriad, accepted at @CVPR 2026
Stefan Baumann tweet media
English
4
54
344
24.3K
unsupervised
unsupervised@dooartlabs·
@rmsnorm huh! this is really interesting i'm doing something similar, but it's more like generating the entire level at once based on the player's actions in the previous level
English
1
0
1
17
Nick Stracke
Nick Stracke@rmsnorm·
🤔 What if an LLM could edit the game world itself during gameplay? I built a sandbox engine where an LLM directly modifies the world state in real time. New biomes, new blocks, even new minerals added mid-game 🤯 No scripts, no reload. Video below ⬇️
English
2
1
8
311