Mingyu Ding

22 posts

Mingyu Ding

@dingmyu

Assistant Professor @UNC @unccs | IDEAL@UNC | Dexterous/Loco-Manipulation | #robotics, #embodiedAI, #3Dvision, #foundationmodels.

가입일 Ağustos 2024

61 팔로잉343 팔로워

Mingyu Ding 리트윗함

Huaxiu Yao@HuaxiuYaoML·1 Nis

🚀 AutoResearchClaw v0.4.0 is here — almost 10K⭐ in just over 2 weeks! Now supporting both fully autonomous AND human-AI co-pilot modes — you choose your level of involvement. What's new: 🤝 6 intervention modes — full-auto, gate-only, checkpoint, step-by-step, co-pilot, and custom. Same powerful 23-stage pipeline, your level of control. 🧪 Idea Workshop — brainstorm and refine hypotheses with AI before committing to a direction 📊 Baseline Navigator — review and customize experiment designs before execution ✍️ Paper Co-Writer — draft papers section-by-section, collaboratively 🧠 SmartPause — the system learns when to pause and ask for your input based on confidence levels 💰 Cost Guardrails — budget alerts at 50/80/100% so you never get surprised 🔀 Pipeline Branching — explore multiple hypotheses in parallel, compare, and merge the best Want full automation? It still does that. Want to stay in the loop? Now you can, at exactly the granularity you want. Try it 👉: github.com/aiming-lab/Aut… Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @jiahengzhang96, @yuyinzhou_cs, @ZhengBerkeley, @cihangxie, @dingmyu, etc.

English

8.7K

Mingyu Ding 리트윗함

Yu Fang@yuffishh·10 Mar

Do Vision-Language-Action Models truly follow your language instructions? We present When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs. They promise to ground language instructions in robot control, yet in practice, often fail to follow language faithfully. 📄 Paper: arxiv.org/abs/2602.17659 🌐 Project: vla-va.github.io 💡 Highlights Vision shortcuts and counterfactual failures. When given instructions that lack strong scene-specific supervision, they default to well-learned scene-specific behaviors regardless of language intent. Counterfactual benchmark. We introduce LIBERO-CF, the first counterfactual benchmark for evaluating language following in VLAs. Our evaluation reveals that counterfactual failures are prevalent yet underexplored across state-of-the-art VLAs. Our solution. We propose Counterfactual Action Guidance (CAG), a simple plug-and-play dual-branch inference scheme that strengthens language conditioning without changing pretrained VLA architectures or weights. Experiments. CAG is effective across multiple dimensions of language grounding, consistently improving both language grounding and task success on under-observed tasks. #VLA #Robotics #Vision #Language

English

142

11.1K

Mingyu Ding 리트윗함

机器之心 JIQIZHIXIN@jiqizhixin·17 Mar

Can we build a universal brain for all dexterous robot hands? Zhenyu Wei, Yunchao Yao, and Mingyu Ding from University of North Carolina at Chapel Hill just tackled this! By creating a "canonical representation," they translate all kinds of dexterous robot hands into a single, unified description and control language. This allows a single AI policy to understand and control them all. The result: policies that instantly generalize to any new robot hand design, achieving an 81.9% zero-shot success rate on unseen hands and opening the door to universal dexterous manipulation. One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation Project: zhenyuwei2003.github.io/OHRA/ Paper: arxiv.org/abs/2602.16712 Code: github.com/zhenyuwei2003/… Our report: mp.weixin.qq.com/s/cp15BVTkxkZM… 📬 #PapersAccepted by Jiqizhixin

English

1.4K

Mingyu Ding@dingmyu·10 Mar

@riddlerrr @mingyding @nikitabier Thanks!

English

Mingyu Ding@dingmyu·7 Mar

Found an impersonation account @mingyding pretending to be me. I only have one account. Please do not interact with the fake account and help report it if possible, thanks!

English

1.9K

Mingyu Ding@dingmyu·10 Mar

@cakeyan9 @mingyding Good news: they even credited me with other professors’ papers, hope that’s true

English

Xin Yan@cakeyan9·9 Mar

@dingmyu @mingyding so real lol. even copied your repost

English

Mingyu Ding@dingmyu·10 Mar

@Hongyu_Lii Maybe I'm the fake one here lol

English

Hongyu Li@Hongyu_Lii·8 Mar

@dingmyu LOL, the fake one is even “verified”. X is so broke

English

164

Mingyu Ding@dingmyu·10 Mar

@xyz2maureen @mingyding Perhaps they bought them from Taobao?🤣

English

Xueyan Zou@xyz2maureen·9 Mar

@dingmyu @mingyding Why the Fake account has so many followers.

English

Mingyu Ding@dingmyu·5 Mar

@h_ravichandar Thanks Harish! Glad you like it. We’re excited about mapping different embodiments through latents and the many potential applications

English

Harish Ravichandar@h_ravichandar·5 Mar

@dingmyu This is really cool! I love the simplicity of this representation, and the associated latent spectrum across morphologies seems fascinating!

English

145

Mingyu Ding@dingmyu·5 Mar

Introducing OHRA (One Hand to Rule Them All) — a canonical representation that unifies diverse dexterous robot hands into a shared space, enabling cross-hand policy transfer and up to 81.9% zero-shot generalization to unseen morphologies 🌐zhenyuwei2003.github.io/OHRA arxiv 2602.16712

English

7.3K

Mingyu Ding 리트윗함

Shoubin Yu@shoubin621·10 Şub

🚨 Excited to share AVIC — an analysis and framework for adaptive test-time scaling with world model imagination in visual spatial reasoning. 📉 Always-on visual imagination is often unnecessary, or even misleading. 📈 AVIC treats visual imagination as a selective, query-dependent test-time resource—showing that better spatial reasoning comes from deciding when and how much to imagine, not from imagining more. ➡️ Across spatial reasoning & embodied navigation, we get stronger accuracy with far fewer world-model calls and tokens. 🧵👇[1/6]

English

15.8K

Mingyu Ding 리트윗함

Xihui Liu@XihuiLiu·27 Tem

Excited to share that Moto is accepted by #ICCV2025 as Oral presentation! Big congrats to @ChenYi041699 and many thanks to collaborators @tttoaster_ @ge_yixiao @dingmyu @yshan2u

Xihui Liu@XihuiLiu

Introducing Moto: Latent Motion Token as the Bridging Language for Robot Manipulation. Motion prior learned from videos can be seamlessly transferred to robot manipulation. Code and model released! @ChenYi041699 @tttoaster_ @ge_yixiao @dingmyu @yshan2u chenyi99.github.io/moto/

English

4.9K

Mingyu Ding@dingmyu·21 Şub

@YuXiang_IRVL Thanks Yu for the insightful talk! Really enjoyed. Just saw this post haha, looking forward to more collaborations

English

Yu Xiang@YuXiang_IRVL·15 Eki

I gave a guest lecture in @dingmyu’s robot learning class at UNC-Chapel Hill today. Thanks for inviting me! Every time I give a talk, I feel I need to improve my presentation😅

English

904

Mingyu Ding 리트윗함

Yu Fang@yuffishh·9 Oca

🤖Robotic VLA Benefits from Joint Learning with Motion Image Diffusion We introduce joint learning with motion image diffusion that enhances VLA models with motion reasoning capabilities. 📄Paper: arxiv.org/abs/2512.18007 🌐Project: vla-motion.github.io Key Highlights 🧠Our method seamlessly augments VLA models with motion reasoning capabilities, while preserving their real-time inference efficiency. 🔎We present motion image diffusion using a DiT, providing dense pixel-level dynamic supervision that complements sparse action supervision. We show that the optical-flow-based motion images are the most effective representation for joint action-motion learning. 🎯We enhance π-series VLA models to achieve 97.5% average success on LIBERO and 58.0% on RoboTwin. #VLA #Robotics #Motion

English

458

25.5K

Mingyu Ding 리트윗함

Salesforce AI Research@SFResearch·7 Oca

(1/5) 🤖 Robotic VLA Benefits from Joint Learning with Motion Image Diffusion We proposed joint learning with motion image diffusion that enhances VLA models with motion reasoning capabilities. 📄 Paper: bit.ly/49b5SAu 🌐 Project: bit.ly/3N6rCEZ #FutureOfAI #EnterpriseAI #VLA #Robotics #Motion

English

1.4K

Mingyu Ding 리트윗함

Huaxiu Yao@HuaxiuYaoML·7 Oca

🧠 Can agent memory scale without losing reasoning? 🔥 We’re excited to share our latest work, SimpleMem, a principled memory framework for LLM agents built around semantic lossless compression. 📉 30× fewer inference tokens 📈 +26.4% avg F1 (vs Mem0) ⚡ 50.2% faster retrieval (vs Mem0) Instead of storing raw interaction history 🗂️ or relying on costly iterative reasoning loops 🔁, SimpleMem treats memory as a structured, evolving representation whose primary objective is 🎯 maximizing information density per token. 📄 Paper: arxiv.org/abs/2601.02553 🔗 Code: github.com/aiming-lab/Sim… 📦 Website：aiming-lab.github.io/SimpleMem-Page/ Nice work @JiaqiLiu835914, Yaofeng Su, @richardxp888, @lillianwei423, and great collab. w/ @cihangxie, Zeyu Zheng, @dingmyu

English

138

962

119.2K

Mingyu Ding@dingmyu·21 Şub

@YinghaoXu1 @hkust Congratulations Yinghao😍😍

English

Yinghao Xu@YinghaoXu1·14 Oca

Life update: I left Stanford in May 2025 to join a robotics startup in China, where I've been working on Embodied AI foundation models. I am thrilled to announce that I’ll be joining the CSE Department at HKUST (@hkust) as an Assistant Professor in April 2026. I am actively looking for students interested in Generative AI, 3D Vision, and Robot Learning. I’m deeply grateful to everyone who supported me during this journey—especially my advisors @GordonWetzstein and @zhoubolei, as well as @Jimantha, @haosu_twitr, and Christian Theobalt (@VcaiMpi) for their recommendations. Special thanks to my close friends Yujun Shen, Ceyuan Yang @CeyuanY, Sida Peng @pengsida, Zifan Shi @Vivianszf1, and Mingyu Ding @dingmyu for their constant support! Looking forward to this new chapter and building something great at HKUST!

English

652

48.2K

Mingyu Ding 리트윗함

Xin Eric Wang@xwang_lk·19 Şub

There is still a big gap between multimodal foundation models (MFMs) and spatial intelligence: 𝐒𝐢𝐭𝐮𝐚𝐭𝐞𝐝 𝐀𝐰𝐚𝐫𝐞𝐧𝐞𝐬𝐬. New work from UCSB/Yale/Stanford/UMD/Amazon/ UCM introduces 𝐒𝐀𝐖-𝐁𝐞𝐧𝐜𝐡, a benchmark for observer-centric spatial reasoning from 𝐬𝐞𝐥𝐟-𝐫𝐞𝐜𝐨𝐫𝐝𝐞𝐝, 𝐞𝐠𝐨𝐜𝐞𝐧𝐭𝐫𝐢𝐜 𝐯𝐢𝐝𝐞𝐨 𝐨𝐧𝐥𝐲 (no bird’s-eye view, no 3D reconstruction). We evaluate 24 SOTA MFMs on six spatial reasoning tasks: 𝒔𝒑𝒂𝒕𝒊𝒂𝒍 𝒎𝒆𝒎𝒐𝒓𝒚, 𝒂𝒇𝒇𝒐𝒓𝒅𝒂𝒏𝒄𝒆, 𝒔𝒆𝒍𝒇-𝒍𝒐𝒄𝒂𝒍𝒊𝒛𝒂𝒕𝒊𝒐𝒏, 𝒓𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒅𝒊𝒓𝒆𝒄𝒕𝒊𝒐𝒏, 𝒓𝒐𝒖𝒕𝒆 𝒔𝒉𝒂𝒑𝒆, 𝒓𝒆𝒗𝒆𝒓𝒔𝒆 𝒓𝒐𝒖𝒕𝒆 𝒑𝒍𝒂𝒏. 📉 Best model: 53.9% 🧑 Humans: 91.6% (37.7% gap) Models systematically: ❌ treat head rotation as translation (camera rotation ≠ movement) ❌ accumulate errors as trajectories get more complex (multi-turn collapse) ❌ fail to maintain a stable observer-centric world state As MFMs move into embodied agents, situated awareness is essential for reliable real-world interaction. We’re releasing SAW-Bench to spur progress on observer-centric spatial reasoning.

Chuhan Li@_Chuhan_Li

Human perception is inherently situated – we understand the world relative to our own body, viewpoint, and motion. To deploy multimodal foundation models in embodied settings, we ask: “Can these models reason in the same observer-centric way?” We study this through SAW-Bench: a novel benchmark for observer-centric situated awareness: - 786 real world egocentric videos - 2,071 human-annotated QA pairs Across all tasks, we evaluate 24 state-of-the-art MFMs: 📉 Best model: 53.9% 🧑 Humans: 91.6% Models systematically: ❌ Confuse head rotation with physical movement ❌ Collapse under multi-turn trajectories ❌ Fail to maintain persistent world-state memory 👉 We see that maintaining a stable observer-centric representation remains challenging. As MFMs are increasingly integrated into embodied agents, situated awareness becomes essential for reliable real-world interaction. We release SAW-Bench and encourage further research toward improving observer-centric reasoning in multimodal foundation models.

English

8.4K

Mingyu Ding 리트윗함

UNC Computer Science@unccs·29 Oca

UNC CS has added 15 tenure-track and 6 teaching faculty members over the past 4 academic years and has partnered with @UNCSDSS on additional hires! The new additions strengthen our pillar research areas and create collaboration opportunities across the department and campus.

English

1.9K

Mingyu Ding 리트윗함

Robotics MDPI@RoboticsMDPI·3 Eyl

🔓New Special Issue “Embodied Intelligence: Physical #HumanRobotInteraction” in @RoboticsMDPI, edited by Dr. Mingyu Ding, from @UNC and @UCBerkeley. 👉Welcome your submission: mdpi.com/journal/roboti… @MDPIopenaccess @MDPIengineering #physicalinference #embodiedAI #tactilesensors

English

623

탐색

@richardxp888 @lillianwei423 @StephenQS0710 @Xinyu2ML @HaoqinT @jiahengzhang96 @yuyinzhou_cs @ZhengBerkeley