Anurag Bagchi (@Miccooper9) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !

English

12

46

418

42.9K

Anurag Bagchi retweetledi

Aryan Satpathy@satpathyaryan45·16 Nis

Excited to share our project - Sim2Reason! Key Insight: Simulators are an untapped source of cheap supervision for scientific reasoning. LLMs can learn physical reasoning from simulation to improve on real world benchmarks such as the International Physics Olympiad!

Mihir Prabhudesai@mihirp98

What if AI learned physics the way Newton did – by experiencing it? We built Sim2Reason: train LLMs inside virtual worlds governed by real physics laws, zero human annotation. Result: +5–10% improvement on International Physics Olympiad, zero-shot. 🧵

English

0

6

19

2.5K

Anurag Bagchi retweetledi

Vincent Sitzmann@vincesitzmann·16 Şub

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English

43

164

1K

382.4K

Anurag Bagchi@Miccooper9·27 Oca

@zhihelu1 Thanks! for world models, more precisely, forward dynamics models (current state + action -> future state), this is standard formulation. There are lots of model-based control approaches that can be used to plan/predict actions using such world models.

English

0

2

382

zhihe lu@zhihelu1·27 Oca

@Miccooper9 Great work! A simple concern is that action trajectory is the input for future frame generation, but in practice we often do not have the GT trajectory. How do we handle this case?

English

1

0

1

437

Anurag Bagchi@Miccooper9·27 Oca

[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !

English

12

46

418

42.9K

Anurag Bagchi@Miccooper9·27 Oca

[6/6] Fine-grained humanoid manipulation EgoWM enables precise 25-DoF joint-angle manipulation with the EVE-1X humanoid, even at 4× temporal compression (Cosmos-2B). 📷 Learn more: Project page: egowm.github.io Paper: arxiv.org/pdf/2601.15284

English

0

1

18

1.5K

Anurag Bagchi@Miccooper9·27 Oca

[5/6] Temporal compression Unlike prior works, we preserve full-sequence diffusion and compress actions to the latent temporal resolution. EgoWM achieves +42% better action alignment at +4s horizon vs. frame-wise autoregressive NWMs even with 4× temporal compression (Cosmos-2B).

English

1

0

13

1.3K

Anurag Bagchi retweetledi

Shraman Pramanick@Shramanpramani2·24 Eki

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM

Jiaxun Cui 🐿️@cuijiaxun

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

English

26

338

109.7K

Anurag Bagchi@Miccooper9·23 Eki

Happening now! ICCV 25 poster#323 Drop by to chat and see some cool results!

Anurag Bagchi@Miccooper9

[ICCV 25] Refer Everything Model (REM) (1/6) We leverage Text-to-Video Generation models to zero-shot segment any concept in a video using text. REM generalises to dynamic concepts like smoke, light-beam and more without ever having seen segmentation masks for these entities.

English

0

1

255

Anurag Bagchi@Miccooper9·22 Eki

@andrew_n_carr Thanks Andrew! We were also really surprised to see how well this worked. Exciting times ahead.

English

0

1

35

Andrew Carr 🤸@andrew_n_carr·21 Eki

@Miccooper9 So cool

English

1

0

30

Anurag Bagchi@Miccooper9·21 Eki

[ICCV 25] Refer Everything Model (REM) (1/6) We leverage Text-to-Video Generation models to zero-shot segment any concept in a video using text. REM generalises to dynamic concepts like smoke, light-beam and more without ever having seen segmentation masks for these entities.

English

1

12

93

10K

Anurag Bagchi@Miccooper9·21 Eki

(6/6) We’re at the start of the internet-scale "video" era, and the possibilities are exciting. Learn more at refereverything.github.io — our code & model weights are available. Visiting ICCV? Come see our poster on Oct 23 to chat and see results in action!

English

1

4

365

Anurag Bagchi@Miccooper9·21 Eki

(5/6) REM demonstrates how Text-to-Video generation can serve as a powerful pre-training paradigm for downstream video understanding. The days of large-scale, labor-intensive video annotation may soon be behind us — pre-train to generate, fine-tune lightly to understand.

English

1

0

3

383

Anurag Bagchi

Keşfet