Yu Fang

62 posts

Yu Fang

@yuffishh

CS PhD student @unccs. Embodied AI and Robot Learning. Prev Research Intern @SFResearch.

Katılım Eylül 2022

227 Takip Edilen156 Takipçiler

Yu Fang retweetledi

Gedas Bertasius@gberta227·6d

If you're curious about the background that inspires a lot of our group's research on skill learning and video understanding, check out this great piece by UNC Research. It covers some of my journey from being a basketball player to an AI researcher. research.unc.edu/story/reading-…

English

877

Yu Fang@yuffishh·10 Mar

✍️Authors: Yu Fang, Yuchun Feng, Dong Jing, Jiaqi Liu, Yue Yang, Zhenyu Wei, Daniel Szafir, Mingyu Ding Thanks to my co-authors! @JiaqiLiu835914 @YYang9923 @dingmyu @unccs [6/6]

Indonesia

245

Yu Fang@yuffishh·10 Mar

🤖Real-world Experiments We study different aspects of language grounding. Each scene is designed with three possible tasks: one well-learned in-domain task with sufficient demonstrations, and two under-observed tasks defined by counterfactual instructions. Across object recognition, spatial reasoning, goal targeting, out-of-distribution generalization, and long-horizon reasoning, our proposed CAG consistently improves the performance of pi0.5 and reduces counterfactual failures. [5/6]

English

167

Yu Fang@yuffishh·10 Mar

Do Vision-Language-Action Models truly follow your language instructions? We present When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs. They promise to ground language instructions in robot control, yet in practice, often fail to follow language faithfully. 📄 Paper: arxiv.org/abs/2602.17659 🌐 Project: vla-va.github.io 💡 Highlights Vision shortcuts and counterfactual failures. When given instructions that lack strong scene-specific supervision, they default to well-learned scene-specific behaviors regardless of language intent. Counterfactual benchmark. We introduce LIBERO-CF, the first counterfactual benchmark for evaluating language following in VLAs. Our evaluation reveals that counterfactual failures are prevalent yet underexplored across state-of-the-art VLAs. Our solution. We propose Counterfactual Action Guidance (CAG), a simple plug-and-play dual-branch inference scheme that strengthens language conditioning without changing pretrained VLA architectures or weights. Experiments. CAG is effective across multiple dimensions of language grounding, consistently improving both language grounding and task success on under-observed tasks. #VLA #Robotics #Vision #Language

English

142

11K

Yu Fang retweetledi

Ian Goodfellow@goodfellow_ian·6 Mar

An article from Moonlake about why they’re building what they’re building. (I’m an advisor)

Moonlake@moonlake

x.com/i/article/2029…

English

107

1.1K

198K

Yu Fang retweetledi

Kyle Vedder@KyleVedder·4 Mar

my first PI project: we added memory! this is a step function capabilities unlock: 15 minute long multi-step tasks in novel environments, controlled by text prompting having run many of the evals, I legit think this is the GPT 2 moment for robotics

Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

632

66.7K

Yu Fang retweetledi

Mingyu Ding@dingmyu·5 Mar

Introducing OHRA (One Hand to Rule Them All) — a canonical representation that unifies diverse dexterous robot hands into a shared space, enabling cross-hand policy transfer and up to 81.9% zero-shot generalization to unseen morphologies 🌐zhenyuwei2003.github.io/OHRA arxiv 2602.16712

English

7.1K

Yu Fang retweetledi

Peter Tong@TongPetersb·4 Mar

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English

222

1.1K

208.6K

Yu Fang retweetledi

Salesforce AI Research@SFResearch·20 Oca

Introducing FOFPred — a language-driven future optical flow prediction framework that enables improved robot control and video generation. Instead of reacting to motion, FOFPred predicts how motion will evolve — conditioned on natural language. 🌐 Project: fofpred.github.io 📄 Paper: arxiv.org/abs/2601.10781 💻 Code: github.com/SalesforceAIRe… 🤗 Model: huggingface.co/Salesforce/FOF… 🕹️ Demo: fofpred.salesforceresearch.ai 🧵[1/3]

English

2.8K

Yu Fang retweetledi

Wenlong Huang@wenlong_huang·8 Oca

What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 point-world.github.io from @Stanford @nvidia

English

227

1.3K

234.4K

Yu Fang@yuffishh·9 Oca

Great collaboration with Kanchana Ranasinghe @kahnchana, Le Xue @Le_Xue01, Honglu Zhou @zhou_honglu, Juntao Tan, Ran Xu @stanleyran, Shelby Heinecke @shelbyh_ai, Caiming Xiong @CaimingXiong, Silvio Savarese @silviocinguetta, Daniel Szafir, Mingyu Ding @dingmyu, Michael S. Ryoo @ryoo_michael, Juan Carlos Niebles @jcniebles! @SFResearch @unccs

Suomi

337

Yu Fang@yuffishh·9 Oca

🎬Qualitative Results Check out our project page for more results!

English

284

Yu Fang@yuffishh·9 Oca

🤖Robotic VLA Benefits from Joint Learning with Motion Image Diffusion We introduce joint learning with motion image diffusion that enhances VLA models with motion reasoning capabilities. 📄Paper: arxiv.org/abs/2512.18007 🌐Project: vla-motion.github.io Key Highlights 🧠Our method seamlessly augments VLA models with motion reasoning capabilities, while preserving their real-time inference efficiency. 🔎We present motion image diffusion using a DiT, providing dense pixel-level dynamic supervision that complements sparse action supervision. We show that the optical-flow-based motion images are the most effective representation for joint action-motion learning. 🎯We enhance π-series VLA models to achieve 97.5% average success on LIBERO and 58.0% on RoboTwin. #VLA #Robotics #Motion

English

459

25.4K

Keşfet

@JiaqiLiu835914 @YYang9923 @dingmyu @unccs @Stanford @nvidia @kahnchana @Le_Xue01