jo.schb

29 posts

jo.schb banner
jo.schb

jo.schb

@jo_schb

PhD student @ CompVis Group, LMU Munich

Katılım Ocak 2021
497 Takip Edilen114 Takipçiler
Sabitlenmiş Tweet
jo.schb
jo.schb@jo_schb·
🤔 What if you could generate an entire image using just one continuous token? 💡 It works if we leverage a self-supervised representation! Meet RepTok🦎: A generative model that encodes an image into a single continuous latent while keeping realism and semantics. 🧵👇
jo.schb tweet media
English
8
23
109
16.9K
jo.schb retweetledi
Nick Stracke
Nick Stracke@rmsnorm·
Video diffusion models learn motion indirectly through pixels. But motion itself is much lower-dimensional. We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics. This enables efficient planning -> 10,000× faster than video models. 🧵👇
English
9
48
315
40.6K
jo.schb retweetledi
Kosta Derpanis
Kosta Derpanis@CSProfKGD·
Going to miss lunch times hanging out with the mensa-table tennis crew 🏓
Kosta Derpanis tweet media
English
2
1
28
2.1K
jo.schb retweetledi
Tao HU
Tao HU@vtaohu·
One of the best ways to spot new research trends is to look at which papers get cited the fastest. I recently found rleak.com, which tracks citation rankings across top conferences like AAAI. i also found: DepthFM ranks #7 among the most-cited AAAI paper in 3k🚀
Tao HU tweet media
English
0
1
6
811
jo.schb
jo.schb@jo_schb·
@ma_sc_ We will release our training and inference code soon :)
English
0
0
2
145
jo.schb
jo.schb@jo_schb·
🤔 What if you could generate an entire image using just one continuous token? 💡 It works if we leverage a self-supervised representation! Meet RepTok🦎: A generative model that encodes an image into a single continuous latent while keeping realism and semantics. 🧵👇
jo.schb tweet media
English
8
23
109
16.9K
jo.schb retweetledi
Pingchuan Ma
Pingchuan Ma@PingchuanMa4·
I'm happy to share that I’ll be presenting two first-authored papers at #ICCV2025 🌺 in Honolulu, together with @MingGui725184! 🏝️ (Thread 🧵👇)
English
1
7
9
1.1K
jo.schb retweetledi
Miguel Angel Bautista
Miguel Angel Bautista@itsbautistam·
There has been quite a lot of talk recently about SSL representations in generative models. IMHO if you are training an image generative model in latent space you should aim for as much compute efficiency as possible (otherwise what's the point?). The amazing @jo_schb and @MingGui725184 + collaborators at LMU have really cracked this problem with RepTok, please check the thread! A common drawback of most works in this direction (even the most recent ones) is that they show viability for ImageNet only, which has its issues (specially if using DINOv2 features). @jo_schb and @MingGui725184 found that RepTok allows you to compress images so much that you can use a pure MLP-based architecture for the more general T2I problem setting obtaining really good results while drastically reducing training compute. I am super grateful to have had the chance to advise the team on this one!
jo.schb@jo_schb

🤔 What if you could generate an entire image using just one continuous token? 💡 It works if we leverage a self-supervised representation! Meet RepTok🦎: A generative model that encodes an image into a single continuous latent while keeping realism and semantics. 🧵👇

English
0
4
21
3.9K
jo.schb retweetledi
Stefan Baumann
Stefan Baumann@StefanABaumann·
🤔 What happens when you poke a scene — and your model has to predict how the world moves in response? We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions. It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇
Stefan Baumann tweet media
English
5
15
38
6.4K