Min-Hung (Steve) Chen

850 posts

Min-Hung (Steve) Chen banner
Min-Hung (Steve) Chen

Min-Hung (Steve) Chen

@CMHungSteven

Staff Research Scientist, NVR TW @NVIDIAAI @NVIDIA (Project Lead: DoRA, EoRA, 4D-RGPT) | Ph.D. @GeorgiaTech | Multimodal AI | https://t.co/dKaEzVoTfZ

Taipei City, Taiwan Katılım Temmuz 2011
1.6K Takip Edilen2.4K Takipçiler
Zixuan Huang
Zixuan Huang@zixuan_huang·
Videos are continuous projections of 3D worlds. After training on massive video data, does 3D understanding emerge naturally? Our #CVPR2026 paper finds that frontier video generators acquire surprisingly strong and generalizable 3D understanding, even rivaling specialized 3D experts. Web: vidfm-3d-probe.github.io
Zixuan Huang tweet media
English
7
26
211
12K
Ryo Hachiuma
Ryo Hachiuma@RHachiuma·
3月付でSenior Research Scientistになりました。 自分一人で成し遂げたというより、まだまだインターンの学生や周りの同僚に助けられてばかりの身ですが、これからも研究を含め頑張っていきます。
日本語
9
7
137
9.6K
Min-Hung (Steve) Chen
Min-Hung (Steve) Chen@CMHungSteven·
Our paper is Oral at @wacv_official THIS WEEK! 🎉🚀🔥 VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models Tired of detectors just shouting "🚨anomaly!" with zero insight? 😩 VADER levels up BIG: ✅ Describes exactly what happened ✅ Explains the causal why 🤔 ✅Reasons step-by-step on object dynamics & interactions like a video detective 🕵️✨ Powered by: 🌟CAES — smart keyframe sampling to catch the full causal story 📸 🌟CORE — contrastive encoder for evolving relations, temporal links & volatility ⚡ SOTA on HIVAU-70k & HAWK benchmarks 📈 🌐Project page: vader-vau.github.io See us live at WACV! 🗣️ Oral (Session 8B – Video Rec & Understanding II): Tue Mar 10, 13:30–14:30, AZ Ballroom 7 🖼️ Poster (Session 6): Tue Mar 10, 15:45–17:30, Tucson Ballroom See you in Tucson! 🌵 #ComputerVision #AnomalyDetection #VideoUnderstanding #MultimodalAI #LLM #CausalAI #WACV2026
Min-Hung (Steve) Chen tweet mediaMin-Hung (Steve) Chen tweet mediaMin-Hung (Steve) Chen tweet media
English
1
4
57
3.6K
Min-Hung (Steve) Chen retweetledi
Zhengzhong Tu
Zhengzhong Tu@_vztu·
𝗔𝗿𝗲 𝗲𝗺𝗯𝗼𝗱𝗶𝗲𝗱 𝗺𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝗿𝗲𝗮𝗱𝘆 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲-𝗔𝗜 𝗲𝗿𝗮? At CVPR 2026 (Denver), we’re excited to host the 3rd MEIS Workshop: “𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗘𝗺𝗯𝗼𝗱𝗶𝗲𝗱 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗠𝗲𝗲𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲-𝗔𝗜 𝗘𝗿𝗮 — 𝗢𝗽𝗽𝗼𝗿𝘁𝘂𝗻𝗶𝘁𝗶𝗲𝘀, 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀, 𝗮𝗻𝗱 𝗙𝘂𝘁𝘂𝗿𝗲𝘀”. We welcome work on foundation models for embodied agents, multi-agent collaboration/decision-making, simulation & benchmarks, and human–agent interaction—with a strong emphasis on robustness, safety, interpretability, and alignment. 🏆 𝗔𝘄𝗮𝗿𝗱𝘀 (𝗖𝗮𝘀𝗵 + 𝗥𝗲𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝗼𝗻) 𝗕𝗲𝘀𝘁 𝗣𝗮𝗽𝗲𝗿 𝗔𝘄𝗮𝗿𝗱: $400 𝗕𝗲𝘀𝘁 𝗣𝗮𝗽𝗲𝗿 𝗥𝘂𝗻𝗻𝗲𝗿-𝗨𝗽: $300 𝗕𝗲𝘀𝘁 𝗗𝗲𝗺𝗼 𝗔𝘄𝗮𝗿𝗱: $300 𝗢𝗿𝗮𝗹 𝗣𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗽𝗽𝗼𝗿𝘁𝘂𝗻𝗶𝘁𝗶𝗲𝘀 🎙️ 𝗜𝗻𝘃𝗶𝘁𝗲𝗱 𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀 (𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀) Xiaopeng (Shaw) Li (UW–Madison), Siheng Chen (SJTU), Henry Liu (UMich), Bernadette Bucher (UMich), Jiachen Li (UC Riverside), Angela Dai (TU Munich), Bolei Zhou (UCLA), Marco Pavone (Stanford), Kun Zhan (Li Auto), Manabu Tsukada (U of Tokyo). 📅 Important Dates 𝗦𝘂𝗯𝗺𝗶𝘀𝘀𝗶𝗼𝗻 𝗱𝗲𝗮𝗱𝗹𝗶𝗻𝗲: Apr 15, 2026 𝗡𝗼𝘁𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻: May 13, 2026 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽: Jun 3, 2026 (CVPR 2026) 🙏 Sponsored by 𝗔𝘅𝗶𝘀 𝗥𝗼𝗯𝗼𝘁𝗶𝗰𝘀. Organized by Texas A&M University (led by my student Xiangbo Gao), KAIST, HKU, TU Munich, UMich, UW-Madison, UC Riverside, Purdue, JHU, and UCLA. Questions: meis-cvpr-2026@googlegroups.com (or xiangbo@tamu.edu)
Zhengzhong Tu tweet media
English
0
2
16
8.9K
Min-Hung (Steve) Chen
Min-Hung (Steve) Chen@CMHungSteven·
Current Vision-Language Models completely struggle with complex 4D dynamics. We fixed that. 🤯 🚨 Introducing 4D-RGPT: distilling perceptual knowledge directly into LLMs for precise space & time reasoning. 🎉 Excited to share our @NVIDIAAI work has been accepted to #CVPR2026! @CVPR A quick dive into how it works 🧵👇
GIF
English
2
16
81
11.4K
Shiyi Cao
Shiyi Cao@shiyi_c98·
Introducing our new work K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model — a new paradigm for automated GPU kernel generation, achieving SoTA results. 🔍 Big insight: Traditional methods treat LLMs as stochastic code generators inside heuristic loops — but this misses a key point: LLMs are powerful planners with rich domain priors. 🧠 Core idea: K-Search uses the LLM itself as a co-evolving world model — one that plans + updates beliefs + guides search decisions based on experience. 📌 This decouples high-level strategy (intent) from low-level code implementation, allowing the optimizer to pursue multi-step transformations even when intermediate implementations don’t immediately improve performance. 📈 Key results: 🔥 Our discovered kernels are ~2.10× average speedup vs state-of-the-art evolutionary search across 4 FlashInfer kernels on H100/B200. 🔥 Up to 14.3× gain on complex Mixture-of-Experts (MoE) kernels. 🔥 State-of-the-art performance on GPUMode TriMul (H100) task — beating both automated and human solutions. 🙏 Acknowledgements This work is developed in @BerkeleySky, w/ the amazing @ziming_mao, @profjoeyg, and @istoica05. We thank @DachengLi177, @MayankMish98, @randwalk0, @pgasawa, @fangz_zzu, and @tian_xia_ for helpful discussion and feedback. We also thank the generous compute support from @databricks, @awscloud, @anyscalecompute, @nvidia, @Google, @LambdaAPI, and @MayfieldFund. 👨‍💻 GitHub: github.com/caoshiyi/K-Sea… 📄 arXiv: arxiv.org/pdf/2602.19128…
Shiyi Cao tweet mediaShiyi Cao tweet media
English
12
65
309
92.9K
Min-Hung (Steve) Chen retweetledi
Antonia Wüst
Antonia Wüst@toniwuest·
Excited to share that our paper "Synthesizing Visual Concepts as Vision-Language Programs" has been accepted to #CVPR2026! 🎉 We propose a novel method that combines VLMs with symbolic program synthesis to learn reliable programs of visual concepts. 🌐 ml-research.github.io/vision-languag…
Antonia Wüst@toniwuest

🚨 New paper alert! We introduce Vision-Language Programs (VLP), a neuro-symbolic framework that combines the perceptual power of VLMs with program synthesis for robust visual reasoning.

English
2
7
59
7K
Manling Li
Manling Li@ManlingLi_·
What is a good caption? We define the quality of a caption on “how good it can support downstream tasks” Led by @ShijiaYangBron @yunongliu1 @BohanZhai @Chenfeng_X 👇
Shijia Yang@ShijiaYangBron

🎉 CaptionQA is accepted to CVPR 2026! If you care about captioning in real systems, we built CaptionQA to be simple and practical. 📄 Paper: arxiv.org/pdf/2511.21025… 🙌 Welcome to try CaptionQA on your models and share results! #CVPR2026 #MLLM #Benchmark #Captioning

English
1
6
37
5.5K
Salesforce AI Research
Salesforce AI Research@SFResearch·
Two papers accepted to @CVPR 2026! 🎉 🎖️ Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🎖️ Future Optical Flow Prediction Improves Robot Control & Video Generation Learn more 👇 #CVPR2026 #FutureOfAI #EnterpriseAI
Salesforce AI Research tweet media
English
2
1
17
1.1K
Chan Hee (Luke) Song
Chan Hee (Luke) Song@luke_ch_song·
🚀 Freshly accepted to CVPR 2026 What if we could train computer-using agents just by watching YouTube? We present Watch & Learn (W&L) -- a inverse-dynamics framework that turns internet videos of humans using computers into learnable UI trajectories at scale. Thread 👇
Chan Hee (Luke) Song tweet media
English
4
24
157
11K