Kolja Bauer

9 posts

Kolja Bauer

Kolja Bauer

@KoljaBauer

ELLIS PhD Student in Generative AI @ Ommer Lab (Stable Diffusion)

Munich Katılım Haziran 2017
162 Takip Edilen32 Takipçiler
Sabitlenmiş Tweet
Kolja Bauer
Kolja Bauer@KoljaBauer·
Do we really need pixel generation to model motion? 🤔 We show how directly representing motion in a compact space enables efficient, scalable planning. 10,000× faster than video models, enabling planning and reasoning in open-world and robotics settings. Check it out ⬇️
Nick Stracke@rmsnorm

Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇

English
1
4
29
3.4K
Kolja Bauer retweetledi
Miguel Angel Bautista
Miguel Angel Bautista@itsbautistam·
Amazing work led by @rmsnorm @KoljaBauer and our collaborators at LMU, to be presented at @CVPR! Personally, I find this question of "what's the right level of abstraction for planning in physical space?" to be very intriguing. Pixels over time are very low SNR (ie. the argument behind JEPA) but motion/trajectories carries a lot on information while being extremely compressible. I believe there's a lot more to uncover from this direction. Very glad to be part of this one!
Nick Stracke@rmsnorm

Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇

English
1
3
12
1.4K
Kolja Bauer retweetledi
Stefan Baumann
Stefan Baumann@StefanABaumann·
You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step. We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models. Myriad, accepted at @CVPR 2026
Stefan Baumann tweet media
English
4
54
344
24.4K
Kolja Bauer retweetledi
Pingchuan Ma
Pingchuan Ma@PingchuanMa4·
I'm happy to share that I’ll be presenting two first-authored papers at #ICCV2025 🌺 in Honolulu, together with @MingGui725184! 🏝️ (Thread 🧵👇)
English
1
7
9
1.1K
Kolja Bauer retweetledi
jo.schb
jo.schb@jo_schb·
🤔 What if you could generate an entire image using just one continuous token? 💡 It works if we leverage a self-supervised representation! Meet RepTok🦎: A generative model that encodes an image into a single continuous latent while keeping realism and semantics. 🧵👇
jo.schb tweet media
English
8
23
109
16.9K
Kolja Bauer retweetledi
Stefan Baumann
Stefan Baumann@StefanABaumann·
🤔 What happens when you poke a scene — and your model has to predict how the world moves in response? We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions. It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇
Stefan Baumann tweet media
English
5
15
38
6.4K
Kolja Bauer retweetledi
Felix Krause
Felix Krause@felix_m_krause·
We cut the cost of training a diffusion model from months of rent to one night out. TREAD matches ImageNet performance of a DiT with 97% fewer A100 hours! No extra components. No extra losses. Training‑time only. Inference remains unchanged. Accepted at ICCV2025🌺
Felix Krause tweet media
English
14
81
820
54.3K
Kolja Bauer retweetledi
Nick Stracke
Nick Stracke@rmsnorm·
🤔 Why do we extract diffusion features from noisy images? Isn’t that destroying information? Yes, it is - but we found a way to do better. 🚀 Here’s how we unlock better features, no noise, no hassle 🧵👇
Nick Stracke tweet media
English
3
41
200
40.1K