Luke J. Huang

21 posts

Luke J. Huang

Luke J. Huang

@whatthelukh

physics + cs @MIT | prev @appliedcompute, US ipho+gold medalist

Cambridge, MA Katılım Şubat 2021
290 Takip Edilen104 Takipçiler
Sabitlenmiş Tweet
Luke J. Huang
Luke J. Huang@whatthelukh·
We introduce Variance Controlled Policy Optimization (VCPO), a method for explicit variance-targeted controls for policy-gradient objectives in off-policy RL — enabling stable, scalable Async RL training. ✨ Seamlessly integrates into common policy-gradient methods like REINFORCE/RLOO/GRPO 🚀 2.5x faster Async RL training while matching Synchronous RL performance 🧠 Robust training stability under high off-policy settings (at least 128 steps off-policy) 📄Paper: arxiv.org/abs/2602.17616 🔗Code: github.com/mit-han-lab/vc… 🧵👇
Luke J. Huang tweet media
English
3
12
68
11.6K
Luke J. Huang
Luke J. Huang@whatthelukh·
wow, really cool that they can get stable FID gradient estimates by decoupling the feature pool from the training batch size! this also seems to have resparked a lot of discourse on whether FID is even a meaningful metric anymore, and honestly I think this work speaks to both sides. It pushes FD lower than ever (sub-0.75, one-step, pixel space!) with generally better quality, while also showing examples of Inception FD severely misranking visual quality. as someone who previously worked on image generation I've had my own doubts about FID for a while, so hopefully this gets the community to start seriously exploring/adopting multi-representation generation metrics!
Jiawei Yang@JiaweiYang118

Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.

English
0
0
1
111
Luke J. Huang retweetledi
Jiawei Yang
Jiawei Yang@JiaweiYang118·
Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.
Jiawei Yang tweet media
English
53
152
900
201.4K
Luke J. Huang
Luke J. Huang@whatthelukh·
Excited to give an oral presentation on Locality-Aware Parallel Decoding (LPD) at ICLR! Would love to connect if you're interested in generative models, ML systems, RL, or just want to chat. - Oral: Session 3B, Friday 10:30 AM - Poster: Friday 3:15–5:45 PM, Pavilion 3, P3-#710 See you at @iclr_conf! 🇧🇷
Luke J. Huang tweet mediaLuke J. Huang tweet media
English
2
2
9
388
Luke J. Huang retweetledi
Luke J. Huang retweetledi
Luke J. Huang
Luke J. Huang@whatthelukh·
We believe these variance-targeted controls are key for robustly stable Asynchronous RL at scale, enabling more efficient long-horizon RL training. We’ll also share a blog post on interesting details, including implementing OPOB “gradient surgery” with Megatron DP/TP/SP! Collaboration with @zhuoyang_zhang @Shang_mit @huqinghao @songhan_mit (8/8)
English
0
0
1
221
Luke J. Huang
Luke J. Huang@whatthelukh·
Async RL already achieves its full speedups at <10-steps off-policy, but we stress-tested far beyond that and found VCPO + Async RL remains stable up to at least 128 steps off-policy. (7/8)
Luke J. Huang tweet media
English
1
1
2
306
Luke J. Huang
Luke J. Huang@whatthelukh·
We introduce Variance Controlled Policy Optimization (VCPO), a method for explicit variance-targeted controls for policy-gradient objectives in off-policy RL — enabling stable, scalable Async RL training. ✨ Seamlessly integrates into common policy-gradient methods like REINFORCE/RLOO/GRPO 🚀 2.5x faster Async RL training while matching Synchronous RL performance 🧠 Robust training stability under high off-policy settings (at least 128 steps off-policy) 📄Paper: arxiv.org/abs/2602.17616 🔗Code: github.com/mit-han-lab/vc… 🧵👇
Luke J. Huang tweet media
English
3
12
68
11.6K
Luke J. Huang retweetledi
Zhuoyang Zhang
Zhuoyang Zhang@zhuoyang_zhang·
☀️We also believe this approach enables us to harness diverse data sources—including cross-embodiment and human data—to continually enhance robotic intelligence. Amazing collaboration with @Shang_mit @huqinghao @whatthelukh James Hou Yufei Sun @Yao__Lu @songhan_mit (8/8)
English
0
1
2
322
Luke J. Huang retweetledi
Zhuoyang Zhang
Zhuoyang Zhang@zhuoyang_zhang·
We release ForeAct (accepted to CVPR’26🎉), a world model planner powered by visual foresight for VLAs - efficiently, modularly, and at scale. ✨ Seamlessly integrates with VLAs by visual augmentation — no architectural changes required ⚡ Generates high-fidelity 640×480 subgoal images in just 0.33s 🧠 Significantly boosts generalization capability and data efficiency 📄Paper: arxiv.org/abs/2602.12322… 🔗Code: github.com/mit-han-lab/fo… 🧵👇
English
4
28
124
23.6K
Luke J. Huang retweetledi
Zhuoyang Zhang
Zhuoyang Zhang@zhuoyang_zhang·
🚀Check out #LPD - our latest work to accelerate autoregressive image generation. LPD stands for Locality-aware Parallel Decoding. ⚡️13× faster than traditional AR models and at least 3.4× faster than previous parallelized AR models. Github: github.com/mit-han-lab/lpd 🧵1/
English
2
15
81
14.4K