Min-Hung (Steve) Chen

850 posts

Min-Hung (Steve) Chen

@CMHungSteven

Staff Research Scientist, NVR TW @NVIDIAAI @NVIDIA (Project Lead: DoRA, EoRA, 4D-RGPT) | Ph.D. @GeorgiaTech | Multimodal AI | https://t.co/dKaEzVoTfZ

Taipei City, Taiwan Katılım Temmuz 2011

1.6K Takip Edilen2.4K Takipçiler

Sabitlenmiş Tweet

Min-Hung (Steve) Chen@CMHungSteven·11 May

(1/N) Are you looking for #Vision #Transformer papers in various areas? Check out this list of papers including a broad range of different tasks! github.com/cmhungsteve/Aw… Feel free to share with others😀 @Montreal_AI @machinelearnflx @hardmaru @ak92501 @arankomatsuzaki @omarsar0

English

170

Min-Hung (Steve) Chen@CMHungSteven·14 Mar

@zixuan_huang Congratulations 🎉 great work!

English

348

Zixuan Huang@zixuan_huang·14 Mar

Videos are continuous projections of 3D worlds. After training on massive video data, does 3D understanding emerge naturally? Our #CVPR2026 paper finds that frontier video generators acquire surprisingly strong and generalizable 3D understanding, even rivaling specialized 3D experts. Web: vidfm-3d-probe.github.io

English

211

12K

Min-Hung (Steve) Chen@CMHungSteven·12 Mar

@RHachiuma おめでとうございます！🎉🎊🥳

日本語

210

Ryo Hachiuma@RHachiuma·12 Mar

3月付でSenior Research Scientistになりました。自分一人で成し遂げたというより、まだまだインターンの学生や周りの同僚に助けられてばかりの身ですが、これからも研究を含め頑張っていきます。

日本語

137

9.6K

Min-Hung (Steve) Chen@CMHungSteven·10 Mar

@_akhaliq Really nice work @peiqing001 !

English

247

AK@_akhaliq·9 Mar

MatAnyone 2 is out on Hugging Face Scaling Video Matting via a Learned Quality Evaluator paper: huggingface.co/papers/2512.11… app: huggingface.co/spaces/Peiqing…

English

363

34.3K

Min-Hung (Steve) Chen@CMHungSteven·3 Mar

📄 Paper: arXiv: arxiv.org/abs/2511.07299 🙌 Kudos to our amazing @NVIDIAAI @NTHU_TAIWAN team: Ying Cheng, Yu-Ho Lin, @CMHungSteven, @FuEnYang1, Shang-Hong Lai

English

380

Min-Hung (Steve) Chen@CMHungSteven·3 Mar

Our paper is Oral at @wacv_official THIS WEEK! 🎉🚀🔥 VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models Tired of detectors just shouting "🚨anomaly!" with zero insight? 😩 VADER levels up BIG: ✅ Describes exactly what happened ✅ Explains the causal why 🤔 ✅Reasons step-by-step on object dynamics & interactions like a video detective 🕵️✨ Powered by: 🌟CAES — smart keyframe sampling to catch the full causal story 📸 🌟CORE — contrastive encoder for evolving relations, temporal links & volatility ⚡ SOTA on HIVAU-70k & HAWK benchmarks 📈 🌐Project page: vader-vau.github.io See us live at WACV! 🗣️ Oral (Session 8B – Video Rec & Understanding II): Tue Mar 10, 13:30–14:30, AZ Ballroom 7 🖼️ Poster (Session 6): Tue Mar 10, 15:45–17:30, Tucson Ballroom See you in Tucson! 🌵 #ComputerVision #AnomalyDetection #VideoUnderstanding #MultimodalAI #LLM #CausalAI #WACV2026

English

3.6K

Min-Hung (Steve) Chen retweetledi

Zhengzhong Tu@_vztu·25 Şub

𝗔𝗿𝗲 𝗲𝗺𝗯𝗼𝗱𝗶𝗲𝗱 𝗺𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝗿𝗲𝗮𝗱𝘆 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲-𝗔𝗜 𝗲𝗿𝗮? At CVPR 2026 (Denver), we’re excited to host the 3rd MEIS Workshop: “𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗘𝗺𝗯𝗼𝗱𝗶𝗲𝗱 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗠𝗲𝗲𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲-𝗔𝗜 𝗘𝗿𝗮 — 𝗢𝗽𝗽𝗼𝗿𝘁𝘂𝗻𝗶𝘁𝗶𝗲𝘀, 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀, 𝗮𝗻𝗱 𝗙𝘂𝘁𝘂𝗿𝗲𝘀”. We welcome work on foundation models for embodied agents, multi-agent collaboration/decision-making, simulation & benchmarks, and human–agent interaction—with a strong emphasis on robustness, safety, interpretability, and alignment. 🏆 𝗔𝘄𝗮𝗿𝗱𝘀 (𝗖𝗮𝘀𝗵 + 𝗥𝗲𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝗼𝗻) 𝗕𝗲𝘀𝘁 𝗣𝗮𝗽𝗲𝗿 𝗔𝘄𝗮𝗿𝗱: $400 𝗕𝗲𝘀𝘁 𝗣𝗮𝗽𝗲𝗿 𝗥𝘂𝗻𝗻𝗲𝗿-𝗨𝗽: $300 𝗕𝗲𝘀𝘁 𝗗𝗲𝗺𝗼 𝗔𝘄𝗮𝗿𝗱: $300 𝗢𝗿𝗮𝗹 𝗣𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗽𝗽𝗼𝗿𝘁𝘂𝗻𝗶𝘁𝗶𝗲𝘀 🎙️ 𝗜𝗻𝘃𝗶𝘁𝗲𝗱 𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀 (𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀) Xiaopeng (Shaw) Li (UW–Madison), Siheng Chen (SJTU), Henry Liu (UMich), Bernadette Bucher (UMich), Jiachen Li (UC Riverside), Angela Dai (TU Munich), Bolei Zhou (UCLA), Marco Pavone (Stanford), Kun Zhan (Li Auto), Manabu Tsukada (U of Tokyo). 📅 Important Dates 𝗦𝘂𝗯𝗺𝗶𝘀𝘀𝗶𝗼𝗻 𝗱𝗲𝗮𝗱𝗹𝗶𝗻𝗲: Apr 15, 2026 𝗡𝗼𝘁𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻: May 13, 2026 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽: Jun 3, 2026 (CVPR 2026) 🙏 Sponsored by 𝗔𝘅𝗶𝘀 𝗥𝗼𝗯𝗼𝘁𝗶𝗰𝘀. Organized by Texas A&M University (led by my student Xiangbo Gao), KAIST, HKU, TU Munich, UMich, UW-Madison, UC Riverside, Purdue, JHU, and UCLA. Questions: meis-cvpr-2026@googlegroups.com (or xiangbo@tamu.edu)

English

8.9K

Min-Hung (Steve) Chen@CMHungSteven·24 Şub

Current Vision-Language Models completely struggle with complex 4D dynamics. We fixed that. 🤯 🚨 Introducing 4D-RGPT: distilling perceptual knowledge directly into LLMs for precise space & time reasoning. 🎉 Excited to share our @NVIDIAAI work has been accepted to #CVPR2026! @CVPR A quick dive into how it works 🧵👇

GIF

English

11.4K

Min-Hung (Steve) Chen@CMHungSteven·27 Şub

@sanskxr02 @NVIDIAAI Thank you!

English

Sanskar Pandey@sanskxr02·27 Şub

@CMHungSteven @NVIDIAAI Congratulations!

English

Min-Hung (Steve) Chen@CMHungSteven·27 Şub

@HildeKuehne @BousselhamWalid @PaulGavrikov @akshit_fbd Congratulations 🎉

English

142

Hilde Kuehne@HildeKuehne·26 Şub

🚀4 Papers accepted to CVPR 2026! Checkout: VOLD: arxiv.org/abs/2510.23497 AMoE: arxiv.org/abs/2512.20157 VisualOverload: arxiv.org/abs/2509.25339 TTRV: arxiv.org/abs/2510.06783 Detailed posts follow soon! Big congrats @BousselhamWalid Sofian Caybouti @PaulGavrikov @akshit_fbd

English

105

Min-Hung (Steve) Chen@CMHungSteven·27 Şub

@shiyi_c98 Great work @shiyi_c98

English

744

Shiyi Cao@shiyi_c98·26 Şub

Introducing our new work K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model — a new paradigm for automated GPU kernel generation, achieving SoTA results. 🔍 Big insight: Traditional methods treat LLMs as stochastic code generators inside heuristic loops — but this misses a key point: LLMs are powerful planners with rich domain priors. 🧠 Core idea: K-Search uses the LLM itself as a co-evolving world model — one that plans + updates beliefs + guides search decisions based on experience. 📌 This decouples high-level strategy (intent) from low-level code implementation, allowing the optimizer to pursue multi-step transformations even when intermediate implementations don’t immediately improve performance. 📈 Key results: 🔥 Our discovered kernels are ~2.10× average speedup vs state-of-the-art evolutionary search across 4 FlashInfer kernels on H100/B200. 🔥 Up to 14.3× gain on complex Mixture-of-Experts (MoE) kernels. 🔥 State-of-the-art performance on GPUMode TriMul (H100) task — beating both automated and human solutions. 🙏 Acknowledgements This work is developed in @BerkeleySky, w/ the amazing @ziming_mao, @profjoeyg, and @istoica05. We thank @DachengLi177, @MayankMish98, @randwalk0, @pgasawa, @fangz_zzu, and @tian_xia_ for helpful discussion and feedback. We also thank the generous compute support from @databricks, @awscloud, @anyscalecompute, @nvidia, @Google, @LambdaAPI, and @MayfieldFund. 👨‍💻 GitHub: github.com/caoshiyi/K-Sea… 📄 arXiv: arxiv.org/pdf/2602.19128…

English

309

92.9K

Min-Hung (Steve) Chen@CMHungSteven·27 Şub

@Eng_Hemdi Thanks for sharing!!

English

Min-Hung (Steve) Chen retweetledi

Abdullah Hamdi@Eng_Hemdi·27 Şub

4D vision models are the next big thing in computer vision to unlock better world models !

Min-Hung (Steve) Chen@CMHungSteven

English

3.9K

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@toniwuest Cool work 😃 congratulations 🎉

English

Antonia Wüst@toniwuest·26 Şub

Excited to share that our paper "Synthesizing Visual Concepts as Vision-Language Programs" has been accepted to #CVPR2026! 🎉 We propose a novel method that combines VLMs with symbolic program synthesis to learn reliable programs of visual concepts. 🌐 ml-research.github.io/vision-languag…

Antonia Wüst@toniwuest

🚨 New paper alert! We introduce Vision-Language Programs (VLP), a neuro-symbolic framework that combines the perceptual power of VLMs with program synthesis for robust visual reasoning.

English

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@bowenwen_me Congrats, @bowenwen_me Really great work 👍

English

Bowen Wen@bowenwen_me·26 Şub

Our paper has been accepted to CVPR 2026🎊. Code will be released very soon! Stay tuned at github.com/NVlabs/Fast-Fo…

Bowen Wen@bowenwen_me

A new milestone for real-time accurate 3D spatial computing! Introducing ⚡️Fast-FoundationStereo⚡️, a real-time zero-shot stereo depth estimation model that accelerates the original FoundationStereo by >10x with comparable quality. Details in threads 🧵 (1/N)

English

8.9K

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@ManlingLi_ @ShijiaYangBron @yunongliu1 @BohanZhai @Chenfeng_X Congrats!

English

107

Manling Li@ManlingLi_·26 Şub

What is a good caption? We define the quality of a caption on “how good it can support downstream tasks” Led by @ShijiaYangBron @yunongliu1 @BohanZhai @Chenfeng_X 👇

Shijia Yang@ShijiaYangBron

🎉 CaptionQA is accepted to CVPR 2026! If you care about captioning in real systems, we built CaptionQA to be simple and practical. 📄 Paper: arxiv.org/pdf/2511.21025… 🙌 Welcome to try CaptionQA on your models and share results! #CVPR2026 #MLLM #Benchmark #Captioning

English

5.5K

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@SFResearch @CVPR Congratulations 🎉

English

Salesforce AI Research@SFResearch·26 Şub

Two papers accepted to @CVPR 2026! 🎉 🎖️ Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🎖️ Future Optical Flow Prediction Improves Robot Control & Video Generation Learn more 👇 #CVPR2026 #FutureOfAI #EnterpriseAI

English

1.1K

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@luke_ch_song Congratulations 🎉 cool work!

English

116

Chan Hee (Luke) Song@luke_ch_song·26 Şub

🚀 Freshly accepted to CVPR 2026 What if we could train computer-using agents just by watching YouTube? We present Watch & Learn (W&L) -- a inverse-dynamics framework that turns internet videos of humans using computers into learnable UI trajectories at scale. Thread 👇

English

157

11K

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@Ayden_Yang_ Congratulations 🎉

English

xiangpeng yang@Ayden_Yang_·24 Şub

📢 Excited to share that our unified video editing work VideoCoF is accepted to CVPR 2026！🚀 🎬TL;DR: We propose a Chain-of-Frames framework for unified video editing with 16× length extrapolation (512+ frames). code: github.com/knightyxp/Vide… #CVPR2026 #ECCV2026 #Seedance

English

2.3K

Min-Hung (Steve) Chen@CMHungSteven·26 Şub

@ShizunWang Congratulations 🎉

English

129

Shizun Wang@ShizunWang·25 Şub

🎉 Excited to share that our work PE3R has been accepted to #CVPR2026 ! 🪄 Take 2 - 3 photos with your phone, upload them, wait a few minutes, and then start exploring your 3D world via text! 🤗 Try demo: huggingface.co/spaces/hujiecp… 📖 Read paper: arxiv.org/abs/2503.07507 ✨ Consider giving us an encouraging star: github.com/hujiecpp/PE3R