Anurag Bagchi

18 posts

Anurag Bagchi

Anurag Bagchi

@Miccooper9

CMU, ex-TikTok AI https://t.co/8BWbkmDWJJ

Katılım Ağustos 2021
1.3K Takip Edilen197 Takipçiler
Sabitlenmiş Tweet
Anurag Bagchi
Anurag Bagchi@Miccooper9·
[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !
English
12
46
418
42.9K
Anurag Bagchi retweetledi
Aryan Satpathy
Aryan Satpathy@satpathyaryan45·
Excited to share our project - Sim2Reason! Key Insight: Simulators are an untapped source of cheap supervision for scientific reasoning. LLMs can learn physical reasoning from simulation to improve on real world benchmarks such as the International Physics Olympiad!
Mihir Prabhudesai@mihirp98

What if AI learned physics the way Newton did – by experiencing it? We built Sim2Reason: train LLMs inside virtual worlds governed by real physics laws, zero human annotation. Result: +5–10% improvement on International Physics Olympiad, zero-shot. 🧵

English
0
6
19
2.5K
Anurag Bagchi retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…
English
43
164
1K
382.4K
Anurag Bagchi
Anurag Bagchi@Miccooper9·
@zhihelu1 Thanks! for world models, more precisely, forward dynamics models (current state + action -> future state), this is standard formulation. There are lots of model-based control approaches that can be used to plan/predict actions using such world models.
English
0
0
2
382
zhihe lu
zhihe lu@zhihelu1·
@Miccooper9 Great work! A simple concern is that action trajectory is the input for future frame generation, but in practice we often do not have the GT trajectory. How do we handle this case?
English
1
0
1
437
Anurag Bagchi
Anurag Bagchi@Miccooper9·
[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !
English
12
46
418
42.9K
Anurag Bagchi
Anurag Bagchi@Miccooper9·
[6/6] Fine-grained humanoid manipulation EgoWM enables precise 25-DoF joint-angle manipulation with the EVE-1X humanoid, even at 4× temporal compression (Cosmos-2B). 📷 Learn more: Project page: egowm.github.io Paper: arxiv.org/pdf/2601.15284
English
0
1
18
1.5K
Anurag Bagchi
Anurag Bagchi@Miccooper9·
[5/6] Temporal compression Unlike prior works, we preserve full-sequence diffusion and compress actions to the latent temporal resolution. EgoWM achieves +42% better action alignment at +4s horizon vs. frame-wise autoregressive NWMs even with 4× temporal compression (Cosmos-2B).
English
1
0
13
1.3K
Anurag Bagchi retweetledi
Shraman Pramanick
Shraman Pramanick@Shramanpramani2·
My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM
Jiaxun Cui 🐿️@cuijiaxun

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

English
26
26
338
109.7K
Anurag Bagchi
Anurag Bagchi@Miccooper9·
@andrew_n_carr Thanks Andrew! We were also really surprised to see how well this worked. Exciting times ahead.
English
0
0
1
35
Anurag Bagchi
Anurag Bagchi@Miccooper9·
[ICCV 25] Refer Everything Model (REM) (1/6) We leverage Text-to-Video Generation models to zero-shot segment any concept in a video using text. REM generalises to dynamic concepts like smoke, light-beam and more without ever having seen segmentation masks for these entities.
English
1
12
93
10K
Anurag Bagchi
Anurag Bagchi@Miccooper9·
(6/6) We’re at the start of the internet-scale "video" era, and the possibilities are exciting. Learn more at refereverything.github.io — our code & model weights are available. Visiting ICCV? Come see our poster on Oct 23 to chat and see results in action!
English
1
1
4
365
Anurag Bagchi
Anurag Bagchi@Miccooper9·
(5/6) REM demonstrates how Text-to-Video generation can serve as a powerful pre-training paradigm for downstream video understanding. The days of large-scale, labor-intensive video annotation may soon be behind us — pre-train to generate, fine-tune lightly to understand.
English
1
0
3
383