Alexander Pondaven (@alexpondaven) - Twitter Profili

Sabitlenmiş Tweet

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English

6

10

51

9.1K

Alexander Pondaven retweetledi

Sumeet Motwani@sumeetrm·16 Nis

We’re releasing LongCoT, an incredibly hard benchmark to measure long-horizon reasoning capabilities over tens to hundreds of thousands of tokens. LongCoT consists of 2.5K questions across chemistry, math, chess, logic, and computer science. Frontier models score less than 10%🧵

English

19

70

405

138.5K

Alexander Pondaven@alexpondaven·6 Nis

@crislenta thanks, appreciate it! would be great to chat, will DM you

English

0

1

14

Cris Lenta@crislenta·6 Nis

@alexpondaven this is so cool! btw are you around SF? would be nice to have a coffee!

English

1

0

1

64

Alexander Pondaven@alexpondaven·3 Nis

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English

6

10

51

9.1K

Alexander Pondaven@alexpondaven·6 Nis

@crislenta Yes, all pixels are generated by the video world model. Each player's actions are fed directly to the diffusion model, no separate game engine or server is involved.

English

1

0

2

26

Cris Lenta@crislenta·5 Nis

@alexpondaven do i understand this correctly that the players are controlled by the world model and not a separate server?

English

1

0

1

55

Alexander Pondaven@alexpondaven·4 Nis

@yxy2168 Thanks Xingyi :)

English

0

32

Xingyi Yang@yxy2168·4 Nis

Amazing paper by Alex!🙋‍♂️

Alexander Pondaven@alexpondaven

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English

1

0

5

334

Alexander Pondaven@alexpondaven·4 Nis

Subject state tokens are modular: more players simply means more tokens concatenated to the sequence and jointly diffused with video latents. 7 players only adds a 6% overhead Here we show 8 players at inference, a subject count unseen during training

Alexander Pondaven@alexpondaven

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English

0

1

10

480

Alexander Pondaven retweetledi

Ziyi Wu@Dazitu_616·3 Nis

To build multi-player games with video models, we likely need a map. One challenge here is the action binding problem, which we solve with simple RoPE-based attention biasing. While existing multi-actor models specilize in one game, we generalize to 46 games and diverse actions!

Alexander Pondaven@alexpondaven

Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵

English

2

6

47

6.5K

Alexander Pondaven@alexpondaven·3 Nis

@sumeetrm The multi-agent man

English

0

117

Sumeet Motwani@sumeetrm·3 Nis

@alexpondaven Awesome!

English

1

0

1

165

Alexander Pondaven@alexpondaven·3 Nis

@kalsbskk81826 Thanks!

English

0

1

87

Lachin Naghashyar@kalsbskk81826·3 Nis

@alexpondaven Amazing work!

English

1

0

1

109

Alexander Pondaven@alexpondaven·3 Nis

Excited to see where this goes. Can we scale action binding to 3D scenes, more complex environments, and open-ended multiplayer worlds? I think we're just scratching the surface of multiplayer video world models. Would love to chat if you're working on similar problems!

English

0

2

208

Alexander Pondaven@alexpondaven·3 Nis

Project page: action-party.github.io Paper: arxiv.org/abs/2604.02330 Code release soon Work done during Snap internship. Big thanks to the team: @Dazitu_616 @igilitschenski @philiptorr @SergeyTulyakov Fabio Pizzati @siarohin9013 @Snap x @UniofOxford x @UofT x @mbzuai

English

1

0

4

330

Alexander Pondaven retweetledi

Runjia Li@RunjiaLi·16 Mar

🎉EgoEdit @Snapchat has been accepted to CVPR 2026! 🏆👻 We are bringing high-quality, real-time editing to egocentric videos. Our massive 100k video dataset and benchmark are ALREADY PUBLIC! 🔓🚀 🏠 Project Page: snap-research.github.io/EgoEdit/ 🤗 Dataset: huggingface.co/datasets/ligua…

AK@_akhaliq

EgoEdit Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

English

5

9

106

21.6K

Alexander Pondaven retweetledi

Moayed Haji Ali@moayedhajiali·14 Mar

Not all pixels are equally hard, but DiTs still allocate compute uniformly across pixels, wasting efforts on easy regions. ELIT adds two lightweight cross-attention layers to focus compute where it matters, cutting FID by 53%. ELIT: snap-research.github.io/elit

English

4

23

162

13.2K

Alexander Pondaven retweetledi

Sumeet Motwani@sumeetrm·9 Eki

🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over complex interdependent problems is incredibly important, but we currently lack enough long-horizon reasoning data > Long-horizon problems are hard, which means training signal is sparse. We’d need a way to provide dense supervision Our solution composes existing short-horizon data to form a synthetic curriculum that keeps growing in complexity! This allows us to scale RL on the same dataset while avoiding saturation, with curriculum acting as dense rewards. At a small scale, we see massive in-domain long-horizon improvements, which transfer to significantly harder benchmarks. Training on composed 6th grade math problems leads to strong gains on AIME! 1/N🤿🧵

English

12

50

295

77.9K

Alexander Pondaven retweetledi

Matt McGill@MattMcGill_·5 Ağu

One nice thing you can do with an interactive world model, look down and see your footwear ... and if the model understands what puddles are. Genie 3 creation.

English

202

412

5.4K

1.6M

Alexander Pondaven

Keşfet