Mingyu Ding

22 posts

Mingyu Ding

Mingyu Ding

@dingmyu

Assistant Professor @UNC @unccs | IDEAL@UNC | Dexterous/Loco-Manipulation | #robotics, #embodiedAI, #3Dvision, #foundationmodels.

가입일 Ağustos 2024
61 팔로잉343 팔로워
Mingyu Ding 리트윗함
Huaxiu Yao
Huaxiu Yao@HuaxiuYaoML·
🚀 AutoResearchClaw v0.4.0 is here — almost 10K⭐ in just over 2 weeks! Now supporting both fully autonomous AND human-AI co-pilot modes — you choose your level of involvement. What's new: 🤝 6 intervention modes — full-auto, gate-only, checkpoint, step-by-step, co-pilot, and custom. Same powerful 23-stage pipeline, your level of control. 🧪 Idea Workshop — brainstorm and refine hypotheses with AI before committing to a direction 📊 Baseline Navigator — review and customize experiment designs before execution ✍️ Paper Co-Writer — draft papers section-by-section, collaboratively 🧠 SmartPause — the system learns when to pause and ask for your input based on confidence levels 💰 Cost Guardrails — budget alerts at 50/80/100% so you never get surprised 🔀 Pipeline Branching — explore multiple hypotheses in parallel, compare, and merge the best Want full automation? It still does that. Want to stay in the loop? Now you can, at exactly the granularity you want. Try it 👉: github.com/aiming-lab/Aut… Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @jiahengzhang96, @yuyinzhou_cs, @ZhengBerkeley, @cihangxie, @dingmyu, etc.
Huaxiu Yao tweet media
English
2
21
74
8.7K
Mingyu Ding 리트윗함
Yu Fang
Yu Fang@yuffishh·
Do Vision-Language-Action Models truly follow your language instructions? We present When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs. They promise to ground language instructions in robot control, yet in practice, often fail to follow language faithfully. 📄 Paper: arxiv.org/abs/2602.17659 🌐 Project: vla-va.github.io 💡 Highlights Vision shortcuts and counterfactual failures. When given instructions that lack strong scene-specific supervision, they default to well-learned scene-specific behaviors regardless of language intent. Counterfactual benchmark. We introduce LIBERO-CF, the first counterfactual benchmark for evaluating language following in VLAs. Our evaluation reveals that counterfactual failures are prevalent yet underexplored across state-of-the-art VLAs. Our solution. We propose Counterfactual Action Guidance (CAG), a simple plug-and-play dual-branch inference scheme that strengthens language conditioning without changing pretrained VLA architectures or weights. Experiments. CAG is effective across multiple dimensions of language grounding, consistently improving both language grounding and task success on under-observed tasks. #VLA #Robotics #Vision #Language
English
1
26
142
11.1K
Mingyu Ding 리트윗함
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
Can we build a universal brain for all dexterous robot hands? Zhenyu Wei, Yunchao Yao, and Mingyu Ding from University of North Carolina at Chapel Hill just tackled this! By creating a "canonical representation," they translate all kinds of dexterous robot hands into a single, unified description and control language. This allows a single AI policy to understand and control them all. The result: policies that instantly generalize to any new robot hand design, achieving an 81.9% zero-shot success rate on unseen hands and opening the door to universal dexterous manipulation. One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation Project: zhenyuwei2003.github.io/OHRA/ Paper: arxiv.org/abs/2602.16712 Code: github.com/zhenyuwei2003/… Our report: mp.weixin.qq.com/s/cp15BVTkxkZM… 📬 #PapersAccepted by Jiqizhixin
English
0
3
9
1.4K
Mingyu Ding
Mingyu Ding@dingmyu·
Found an impersonation account @mingyding pretending to be me. I only have one account. Please do not interact with the fake account and help report it if possible, thanks!
English
3
2
15
1.9K
Hongyu Li
Hongyu Li@Hongyu_Lii·
@dingmyu LOL, the fake one is even “verified”. X is so broke
English
1
0
0
164
Mingyu Ding
Mingyu Ding@dingmyu·
@h_ravichandar Thanks Harish! Glad you like it. We’re excited about mapping different embodiments through latents and the many potential applications
English
0
0
1
82
Harish Ravichandar
Harish Ravichandar@h_ravichandar·
@dingmyu This is really cool! I love the simplicity of this representation, and the associated latent spectrum across morphologies seems fascinating!
English
1
0
0
145
Mingyu Ding
Mingyu Ding@dingmyu·
Introducing OHRA (One Hand to Rule Them All) — a canonical representation that unifies diverse dexterous robot hands into a shared space, enabling cross-hand policy transfer and up to 81.9% zero-shot generalization to unseen morphologies 🌐zhenyuwei2003.github.io/OHRA arxiv 2602.16712
Mingyu Ding tweet media
English
4
16
86
7.3K
Mingyu Ding 리트윗함
Shoubin Yu
Shoubin Yu@shoubin621·
🚨 Excited to share AVIC — an analysis and framework for adaptive test-time scaling with world model imagination in visual spatial reasoning. 📉 Always-on visual imagination is often unnecessary, or even misleading. 📈 AVIC treats visual imagination as a selective, query-dependent test-time resource—showing that better spatial reasoning comes from deciding when and how much to imagine, not from imagining more. ➡️ Across spatial reasoning & embodied navigation, we get stronger accuracy with far fewer world-model calls and tokens. 🧵👇[1/6]
English
3
38
88
15.8K
Mingyu Ding 리트윗함
Mingyu Ding
Mingyu Ding@dingmyu·
@YuXiang_IRVL Thanks Yu for the insightful talk! Really enjoyed. Just saw this post haha, looking forward to more collaborations
English
0
0
1
10
Yu Xiang
Yu Xiang@YuXiang_IRVL·
I gave a guest lecture in @dingmyu’s robot learning class at UNC-Chapel Hill today. Thanks for inviting me! Every time I give a talk, I feel I need to improve my presentation😅
English
2
0
13
904
Mingyu Ding 리트윗함
Yu Fang
Yu Fang@yuffishh·
🤖Robotic VLA Benefits from Joint Learning with Motion Image Diffusion We introduce joint learning with motion image diffusion that enhances VLA models with motion reasoning capabilities. 📄Paper: arxiv.org/abs/2512.18007 🌐Project: vla-motion.github.io Key Highlights 🧠Our method seamlessly augments VLA models with motion reasoning capabilities, while preserving their real-time inference efficiency. 🔎We present motion image diffusion using a DiT, providing dense pixel-level dynamic supervision that complements sparse action supervision. We show that the optical-flow-based motion images are the most effective representation for joint action-motion learning. 🎯We enhance π-series VLA models to achieve 97.5% average success on LIBERO and 58.0% on RoboTwin. #VLA #Robotics #Motion
English
6
50
458
25.5K
Mingyu Ding 리트윗함
Huaxiu Yao
Huaxiu Yao@HuaxiuYaoML·
🧠 Can agent memory scale without losing reasoning? 🔥 We’re excited to share our latest work, SimpleMem, a principled memory framework for LLM agents built around semantic lossless compression. 📉 30× fewer inference tokens 📈 +26.4% avg F1 (vs Mem0) ⚡ 50.2% faster retrieval (vs Mem0) Instead of storing raw interaction history 🗂️ or relying on costly iterative reasoning loops 🔁, SimpleMem treats memory as a structured, evolving representation whose primary objective is 🎯 maximizing information density per token. 📄 Paper: arxiv.org/abs/2601.02553 🔗 Code: github.com/aiming-lab/Sim… 📦 Website:aiming-lab.github.io/SimpleMem-Page/ Nice work @JiaqiLiu835914, Yaofeng Su, @richardxp888, @lillianwei423, and great collab. w/ @cihangxie, Zeyu Zheng, @dingmyu
English
53
138
962
119.2K
Yinghao Xu
Yinghao Xu@YinghaoXu1·
Life update: I left Stanford in May 2025 to join a robotics startup in China, where I've been working on Embodied AI foundation models. I am thrilled to announce that I’ll be joining the CSE Department at HKUST (@hkust) as an Assistant Professor in April 2026. I am actively looking for students interested in Generative AI, 3D Vision, and Robot Learning. I’m deeply grateful to everyone who supported me during this journey—especially my advisors @GordonWetzstein and @zhoubolei, as well as @Jimantha, @haosu_twitr, and Christian Theobalt (@VcaiMpi) for their recommendations. Special thanks to my close friends Yujun Shen, Ceyuan Yang @CeyuanY, Sida Peng @pengsida, Zifan Shi @Vivianszf1, and Mingyu Ding @dingmyu for their constant support! Looking forward to this new chapter and building something great at HKUST!
English
33
27
652
48.2K
Mingyu Ding 리트윗함
Xin Eric Wang
Xin Eric Wang@xwang_lk·
There is still a big gap between multimodal foundation models (MFMs) and spatial intelligence: 𝐒𝐢𝐭𝐮𝐚𝐭𝐞𝐝 𝐀𝐰𝐚𝐫𝐞𝐧𝐞𝐬𝐬. New work from UCSB/Yale/Stanford/UMD/Amazon/ UCM introduces 𝐒𝐀𝐖-𝐁𝐞𝐧𝐜𝐡, a benchmark for observer-centric spatial reasoning from 𝐬𝐞𝐥𝐟-𝐫𝐞𝐜𝐨𝐫𝐝𝐞𝐝, 𝐞𝐠𝐨𝐜𝐞𝐧𝐭𝐫𝐢𝐜 𝐯𝐢𝐝𝐞𝐨 𝐨𝐧𝐥𝐲 (no bird’s-eye view, no 3D reconstruction). We evaluate 24 SOTA MFMs on six spatial reasoning tasks: 𝒔𝒑𝒂𝒕𝒊𝒂𝒍 𝒎𝒆𝒎𝒐𝒓𝒚, 𝒂𝒇𝒇𝒐𝒓𝒅𝒂𝒏𝒄𝒆, 𝒔𝒆𝒍𝒇-𝒍𝒐𝒄𝒂𝒍𝒊𝒛𝒂𝒕𝒊𝒐𝒏, 𝒓𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒅𝒊𝒓𝒆𝒄𝒕𝒊𝒐𝒏, 𝒓𝒐𝒖𝒕𝒆 𝒔𝒉𝒂𝒑𝒆, 𝒓𝒆𝒗𝒆𝒓𝒔𝒆 𝒓𝒐𝒖𝒕𝒆 𝒑𝒍𝒂𝒏. 📉 Best model: 53.9% 🧑 Humans: 91.6% (37.7% gap) Models systematically: ❌ treat head rotation as translation (camera rotation ≠ movement) ❌ accumulate errors as trajectories get more complex (multi-turn collapse) ❌ fail to maintain a stable observer-centric world state As MFMs move into embodied agents, situated awareness is essential for reliable real-world interaction. We’re releasing SAW-Bench to spur progress on observer-centric spatial reasoning.
Chuhan Li@_Chuhan_Li

Human perception is inherently situated – we understand the world relative to our own body, viewpoint, and motion. To deploy multimodal foundation models in embodied settings, we ask: “Can these models reason in the same observer-centric way?” We study this through SAW-Bench: a novel benchmark for observer-centric situated awareness: - 786 real world egocentric videos - 2,071 human-annotated QA pairs Across all tasks, we evaluate 24 state-of-the-art MFMs: 📉 Best model: 53.9% 🧑 Humans: 91.6% Models systematically: ❌ Confuse head rotation with physical movement ❌ Collapse under multi-turn trajectories ❌ Fail to maintain persistent world-state memory 👉 We see that maintaining a stable observer-centric representation remains challenging. As MFMs are increasingly integrated into embodied agents, situated awareness becomes essential for reliable real-world interaction. We release SAW-Bench and encourage further research toward improving observer-centric reasoning in multimodal foundation models.

English
2
9
43
8.4K
Mingyu Ding 리트윗함
UNC Computer Science
UNC Computer Science@unccs·
UNC CS has added 15 tenure-track and 6 teaching faculty members over the past 4 academic years and has partnered with @UNCSDSS on additional hires! The new additions strengthen our pillar research areas and create collaboration opportunities across the department and campus.
UNC Computer Science tweet media
English
0
5
10
1.9K