Alexander Pondaven

43 posts

Alexander Pondaven

Alexander Pondaven

@alexpondaven

Working on controllable video generation. PhD student @UniofOxford @aims_oxford @OxfordTVG @Snap. MEng @Imperialcollege

Santa Monica, CA Katılım Ekim 2023
563 Takip Edilen107 Takipçiler
Sabitlenmiş Tweet
Alexander Pondaven
Alexander Pondaven@alexpondaven·
Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵
Alexander Pondaven tweet media
English
6
10
51
9.1K
Alexander Pondaven retweetledi
Sumeet Motwani
Sumeet Motwani@sumeetrm·
We’re releasing LongCoT, an incredibly hard benchmark to measure long-horizon reasoning capabilities over tens to hundreds of thousands of tokens. LongCoT consists of 2.5K questions across chemistry, math, chess, logic, and computer science. Frontier models score less than 10%🧵
Sumeet Motwani tweet media
English
19
70
405
138.5K
Cris Lenta
Cris Lenta@crislenta·
@alexpondaven this is so cool! btw are you around SF? would be nice to have a coffee!
English
1
0
1
64
Alexander Pondaven
Alexander Pondaven@alexpondaven·
Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵
Alexander Pondaven tweet media
English
6
10
51
9.1K
Alexander Pondaven
Alexander Pondaven@alexpondaven·
@crislenta Yes, all pixels are generated by the video world model. Each player's actions are fed directly to the diffusion model, no separate game engine or server is involved.
English
1
0
2
26
Cris Lenta
Cris Lenta@crislenta·
@alexpondaven do i understand this correctly that the players are controlled by the world model and not a separate server?
English
1
0
1
55
Alexander Pondaven
Alexander Pondaven@alexpondaven·
Subject state tokens are modular: more players simply means more tokens concatenated to the sequence and jointly diffused with video latents. 7 players only adds a 6% overhead Here we show 8 players at inference, a subject count unseen during training
Alexander Pondaven@alexpondaven

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English
0
1
10
480
Alexander Pondaven retweetledi
Ziyi Wu
Ziyi Wu@Dazitu_616·
To build multi-player games with video models, we likely need a map. One challenge here is the action binding problem, which we solve with simple RoPE-based attention biasing. While existing multi-actor models specilize in one game, we generalize to 46 games and diverse actions!
Alexander Pondaven@alexpondaven

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English
2
6
47
6.5K
Alexander Pondaven
Alexander Pondaven@alexpondaven·
Excited to see where this goes. Can we scale action binding to 3D scenes, more complex environments, and open-ended multiplayer worlds? I think we're just scratching the surface of multiplayer video world models. Would love to chat if you're working on similar problems!
English
0
0
2
208
Alexander Pondaven retweetledi
Moayed Haji Ali
Moayed Haji Ali@moayedhajiali·
Not all pixels are equally hard, but DiTs still allocate compute uniformly across pixels, wasting efforts on easy regions. ELIT adds two lightweight cross-attention layers to focus compute where it matters, cutting FID by 53%. ELIT: snap-research.github.io/elit
Moayed Haji Ali tweet media
English
4
23
162
13.2K
Alexander Pondaven retweetledi
Sumeet Motwani
Sumeet Motwani@sumeetrm·
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over complex interdependent problems is incredibly important, but we currently lack enough long-horizon reasoning data > Long-horizon problems are hard, which means training signal is sparse. We’d need a way to provide dense supervision Our solution composes existing short-horizon data to form a synthetic curriculum that keeps growing in complexity! This allows us to scale RL on the same dataset while avoiding saturation, with curriculum acting as dense rewards. At a small scale, we see massive in-domain long-horizon improvements, which transfer to significantly harder benchmarks. Training on composed 6th grade math problems leads to strong gains on AIME! 1/N🤿🧵
Sumeet Motwani tweet media
English
12
50
295
77.9K
Alexander Pondaven retweetledi
Matt McGill
Matt McGill@MattMcGill_·
One nice thing you can do with an interactive world model, look down and see your footwear ... and if the model understands what puddles are. Genie 3 creation.
English
202
412
5.4K
1.6M