Mingyu Ding

22 posts

Mingyu Ding

Mingyu Ding

@dingmyu

Assistant Professor @UNC @unccs | IDEAL@UNC | Dexterous/Loco-Manipulation | #robotics, #embodiedAI, #3Dvision, #foundationmodels.

Beigetreten AฤŸustos 2024
61 Folgt343 Follower
Mingyu Ding retweetet
Huaxiu Yao
Huaxiu Yao@HuaxiuYaoMLยท
๐Ÿš€ AutoResearchClaw v0.4.0 is here โ€” almost 10Kโญ in just over 2 weeks! Now supporting both fully autonomous AND human-AI co-pilot modes โ€” you choose your level of involvement. What's new: ๐Ÿค 6 intervention modes โ€” full-auto, gate-only, checkpoint, step-by-step, co-pilot, and custom. Same powerful 23-stage pipeline, your level of control. ๐Ÿงช Idea Workshop โ€” brainstorm and refine hypotheses with AI before committing to a direction ๐Ÿ“Š Baseline Navigator โ€” review and customize experiment designs before execution โœ๏ธ Paper Co-Writer โ€” draft papers section-by-section, collaboratively ๐Ÿง  SmartPause โ€” the system learns when to pause and ask for your input based on confidence levels ๐Ÿ’ฐ Cost Guardrails โ€” budget alerts at 50/80/100% so you never get surprised ๐Ÿ”€ Pipeline Branching โ€” explore multiple hypotheses in parallel, compare, and merge the best Want full automation? It still does that. Want to stay in the loop? Now you can, at exactly the granularity you want. Try it ๐Ÿ‘‰: github.com/aiming-lab/Autโ€ฆ Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @jiahengzhang96, @yuyinzhou_cs, @ZhengBerkeley, @cihangxie, @dingmyu, etc.
Huaxiu Yao tweet media
English
2
21
74
8.7K
Mingyu Ding retweetet
Yu Fang
Yu Fang@yuffishhยท
Do Vision-Language-Action Models truly follow your language instructions? We present When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs. They promise to ground language instructions in robot control, yet in practice, often fail to follow language faithfully. ๐Ÿ“„ Paper: arxiv.org/abs/2602.17659 ๐ŸŒ Project: vla-va.github.io ๐Ÿ’ก Highlights Vision shortcuts and counterfactual failures. When given instructions that lack strong scene-specific supervision, they default to well-learned scene-specific behaviors regardless of language intent. Counterfactual benchmark. We introduce LIBERO-CF, the first counterfactual benchmark for evaluating language following in VLAs. Our evaluation reveals that counterfactual failures are prevalent yet underexplored across state-of-the-art VLAs. Our solution. We propose Counterfactual Action Guidance (CAG), a simple plug-and-play dual-branch inference scheme that strengthens language conditioning without changing pretrained VLA architectures or weights. Experiments. CAG is effective across multiple dimensions of language grounding, consistently improving both language grounding and task success on under-observed tasks. #VLA #Robotics #Vision #Language
English
1
26
142
11.1K
Mingyu Ding retweetet
ๆœบๅ™จไน‹ๅฟƒ JIQIZHIXIN
Can we build a universal brain for all dexterous robot hands? Zhenyu Wei, Yunchao Yao, and Mingyu Ding from University of North Carolina at Chapel Hill just tackled this! By creating a "canonical representation," they translate all kinds of dexterous robot hands into a single, unified description and control language. This allows a single AI policy to understand and control them all. The result: policies that instantly generalize to any new robot hand design, achieving an 81.9% zero-shot success rate on unseen hands and opening the door to universal dexterous manipulation. One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation Project: zhenyuwei2003.github.io/OHRA/ Paper: arxiv.org/abs/2602.16712 Code: github.com/zhenyuwei2003/โ€ฆ Our report: mp.weixin.qq.com/s/cp15BVTkxkZMโ€ฆ ๐Ÿ“ฌ #PapersAccepted by Jiqizhixin
English
0
3
9
1.4K
Mingyu Ding
Mingyu Ding@dingmyuยท
Found an impersonation account @mingyding pretending to be me. I only have one account. Please do not interact with the fake account and help report it if possible, thanks!
English
3
2
15
1.9K
Mingyu Ding
Mingyu Ding@dingmyuยท
@cakeyan9 @mingyding Good news: they even credited me with other professorsโ€™ papers, hope thatโ€™s true
English
0
0
0
14
Hongyu Li
Hongyu Li@Hongyu_Liiยท
@dingmyu LOL, the fake one is even โ€œverifiedโ€. X is so broke
English
1
0
0
164
Mingyu Ding
Mingyu Ding@dingmyuยท
@h_ravichandar Thanks Harish! Glad you like it. Weโ€™re excited about mapping different embodiments through latents and the many potential applications
English
0
0
1
82
Harish Ravichandar
Harish Ravichandar@h_ravichandarยท
@dingmyu This is really cool! I love the simplicity of this representation, and the associated latent spectrum across morphologies seems fascinating!
English
1
0
0
145
Mingyu Ding
Mingyu Ding@dingmyuยท
Introducing OHRA (One Hand to Rule Them All) โ€” a canonical representation that unifies diverse dexterous robot hands into a shared space, enabling cross-hand policy transfer and up to 81.9% zero-shot generalization to unseen morphologies ๐ŸŒzhenyuwei2003.github.io/OHRA arxiv 2602.16712
Mingyu Ding tweet media
English
4
16
86
7.3K
Mingyu Ding retweetet
Shoubin Yu
Shoubin Yu@shoubin621ยท
๐Ÿšจ Excited to share AVIC โ€” an analysis and framework for adaptive test-time scaling with world model imagination in visual spatial reasoning. ๐Ÿ“‰ Always-on visual imagination is often unnecessary, or even misleading. ๐Ÿ“ˆ AVIC treats visual imagination as a selective, query-dependent test-time resourceโ€”showing that better spatial reasoning comes from deciding when and how much to imagine, not from imagining more. โžก๏ธ Across spatial reasoning & embodied navigation, we get stronger accuracy with far fewer world-model calls and tokens. ๐Ÿงต๐Ÿ‘‡[1/6]
English
3
38
88
15.8K
Mingyu Ding retweetet
Mingyu Ding
Mingyu Ding@dingmyuยท
@YuXiang_IRVL Thanks Yu for the insightful talk! Really enjoyed. Just saw this post haha, looking forward to more collaborations
English
0
0
1
10
Yu Xiang
Yu Xiang@YuXiang_IRVLยท
I gave a guest lecture in @dingmyuโ€™s robot learning class at UNC-Chapel Hill today. Thanks for inviting me! Every time I give a talk, I feel I need to improve my presentation๐Ÿ˜…
English
2
0
13
904
Mingyu Ding retweetet
Yu Fang
Yu Fang@yuffishhยท
๐Ÿค–Robotic VLA Benefits from Joint Learning with Motion Image Diffusion We introduce joint learning with motion image diffusion that enhances VLA models with motion reasoning capabilities. ๐Ÿ“„Paper: arxiv.org/abs/2512.18007 ๐ŸŒProject: vla-motion.github.io Key Highlights ๐Ÿง Our method seamlessly augments VLA models with motion reasoning capabilities, while preserving their real-time inference efficiency. ๐Ÿ”ŽWe present motion image diffusion using a DiT, providing dense pixel-level dynamic supervision that complements sparse action supervision. We show that the optical-flow-based motion images are the most effective representation for joint action-motion learning. ๐ŸŽฏWe enhance ฯ€-series VLA models to achieve 97.5% average success on LIBERO and 58.0% on RoboTwin. #VLA #Robotics #Motion
English
6
50
458
25.5K
Mingyu Ding retweetet
Huaxiu Yao
Huaxiu Yao@HuaxiuYaoMLยท
๐Ÿง  Can agent memory scale without losing reasoning? ๐Ÿ”ฅ Weโ€™re excited to share our latest work, SimpleMem, a principled memory framework for LLM agents built around semantic lossless compression. ๐Ÿ“‰ 30ร— fewer inference tokens ๐Ÿ“ˆ +26.4% avg F1 (vs Mem0) โšก 50.2% faster retrieval (vs Mem0) Instead of storing raw interaction history ๐Ÿ—‚๏ธ or relying on costly iterative reasoning loops ๐Ÿ”, SimpleMem treats memory as a structured, evolving representation whose primary objective is ๐ŸŽฏ maximizing information density per token. ๐Ÿ“„ Paper: arxiv.org/abs/2601.02553 ๐Ÿ”— Code: github.com/aiming-lab/Simโ€ฆ ๐Ÿ“ฆ Website๏ผšaiming-lab.github.io/SimpleMem-Page/ Nice work @JiaqiLiu835914, Yaofeng Su, @richardxp888, @lillianwei423, and great collab. w/ @cihangxie, Zeyu Zheng, @dingmyu
English
53
138
962
119.2K
Yinghao Xu
Yinghao Xu@YinghaoXu1ยท
Life update: I left Stanford in May 2025 to join a robotics startup in China, where I've been working on Embodied AI foundation models. I am thrilled to announce that Iโ€™ll be joining the CSE Department at HKUST (@hkust) as an Assistant Professor in April 2026. I am actively looking for students interested in Generative AI, 3D Vision, and Robot Learning. Iโ€™m deeply grateful to everyone who supported me during this journeyโ€”especially my advisors @GordonWetzstein and @zhoubolei, as well as @Jimantha, @haosu_twitr, and Christian Theobalt (@VcaiMpi) for their recommendations. Special thanks to my close friends Yujun Shen, Ceyuan Yang @CeyuanY, Sida Peng @pengsida, Zifan Shi @Vivianszf1, and Mingyu Ding @dingmyu for their constant support! Looking forward to this new chapter and building something great at HKUST!
English
33
27
652
48.2K
Mingyu Ding retweetet
Xin Eric Wang
Xin Eric Wang@xwang_lkยท
There is still a big gap between multimodal foundation models (MFMs) and spatial intelligence: ๐’๐ข๐ญ๐ฎ๐š๐ญ๐ž๐ ๐€๐ฐ๐š๐ซ๐ž๐ง๐ž๐ฌ๐ฌ. New work from UCSB/Yale/Stanford/UMD/Amazon/ UCM introduces ๐’๐€๐–-๐๐ž๐ง๐œ๐ก, a benchmark for observer-centric spatial reasoning from ๐ฌ๐ž๐ฅ๐Ÿ-๐ซ๐ž๐œ๐จ๐ซ๐๐ž๐, ๐ž๐ ๐จ๐œ๐ž๐ง๐ญ๐ซ๐ข๐œ ๐ฏ๐ข๐๐ž๐จ ๐จ๐ง๐ฅ๐ฒ (no birdโ€™s-eye view, no 3D reconstruction). We evaluate 24 SOTA MFMs on six spatial reasoning tasks: ๐’”๐’‘๐’‚๐’•๐’Š๐’‚๐’ ๐’Ž๐’†๐’Ž๐’๐’“๐’š, ๐’‚๐’‡๐’‡๐’๐’“๐’…๐’‚๐’๐’„๐’†, ๐’”๐’†๐’๐’‡-๐’๐’๐’„๐’‚๐’๐’Š๐’›๐’‚๐’•๐’Š๐’๐’, ๐’“๐’†๐’๐’‚๐’•๐’Š๐’—๐’† ๐’…๐’Š๐’“๐’†๐’„๐’•๐’Š๐’๐’, ๐’“๐’๐’–๐’•๐’† ๐’”๐’‰๐’‚๐’‘๐’†, ๐’“๐’†๐’—๐’†๐’“๐’”๐’† ๐’“๐’๐’–๐’•๐’† ๐’‘๐’๐’‚๐’. ๐Ÿ“‰ Best model: 53.9% ๐Ÿง‘ Humans: 91.6% (37.7% gap) Models systematically: โŒ treat head rotation as translation (camera rotation โ‰  movement) โŒ accumulate errors as trajectories get more complex (multi-turn collapse) โŒ fail to maintain a stable observer-centric world state As MFMs move into embodied agents, situated awareness is essential for reliable real-world interaction. Weโ€™re releasing SAW-Bench to spur progress on observer-centric spatial reasoning.
Chuhan Li@_Chuhan_Li

Human perception is inherently situated โ€“ we understand the world relative to our own body, viewpoint, and motion. To deploy multimodal foundation models in embodied settings, we ask: โ€œCan these models reason in the same observer-centric way?โ€ We study this through SAW-Bench: a novel benchmark for observer-centric situated awareness: - 786 real world egocentric videos - 2,071 human-annotated QA pairs Across all tasks, we evaluate 24 state-of-the-art MFMs: ๐Ÿ“‰ Best model: 53.9% ๐Ÿง‘ Humans: 91.6% Models systematically: โŒ Confuse head rotation with physical movement โŒ Collapse under multi-turn trajectories โŒ Fail to maintain persistent world-state memory ๐Ÿ‘‰ We see that maintaining a stable observer-centric representation remains challenging. As MFMs are increasingly integrated into embodied agents, situated awareness becomes essential for reliable real-world interaction. We release SAW-Bench and encourage further research toward improving observer-centric reasoning in multimodal foundation models.

English
2
9
43
8.4K
Mingyu Ding retweetet
UNC Computer Science
UNC Computer Science@unccsยท
UNC CS has added 15 tenure-track and 6 teaching faculty members over the past 4 academic years and has partnered with @UNCSDSS on additional hires! The new additions strengthen our pillar research areas and create collaboration opportunities across the department and campus.
UNC Computer Science tweet media
English
0
5
10
1.9K