Yu Fang

62 posts

Yu Fang

Yu Fang

@yuffishh

CS PhD student @unccs. Embodied AI and Robot Learning. Prev Research Intern @SFResearch.

Katılım Eylül 2022
227 Takip Edilen156 Takipçiler
Yu Fang retweetledi
Gedas Bertasius
Gedas Bertasius@gberta227·
If you're curious about the background that inspires a lot of our group's research on skill learning and video understanding, check out this great piece by UNC Research. It covers some of my journey from being a basketball player to an AI researcher. research.unc.edu/story/reading-…
English
0
7
18
877
Yu Fang
Yu Fang@yuffishh·
🤖Real-world Experiments We study different aspects of language grounding. Each scene is designed with three possible tasks: one well-learned in-domain task with sufficient demonstrations, and two under-observed tasks defined by counterfactual instructions. Across object recognition, spatial reasoning, goal targeting, out-of-distribution generalization, and long-horizon reasoning, our proposed CAG consistently improves the performance of pi0.5 and reduces counterfactual failures. [5/6]
Yu Fang tweet media
English
1
0
2
167
Yu Fang
Yu Fang@yuffishh·
Do Vision-Language-Action Models truly follow your language instructions? We present When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs. They promise to ground language instructions in robot control, yet in practice, often fail to follow language faithfully. 📄 Paper: arxiv.org/abs/2602.17659 🌐 Project: vla-va.github.io 💡 Highlights Vision shortcuts and counterfactual failures. When given instructions that lack strong scene-specific supervision, they default to well-learned scene-specific behaviors regardless of language intent. Counterfactual benchmark. We introduce LIBERO-CF, the first counterfactual benchmark for evaluating language following in VLAs. Our evaluation reveals that counterfactual failures are prevalent yet underexplored across state-of-the-art VLAs. Our solution. We propose Counterfactual Action Guidance (CAG), a simple plug-and-play dual-branch inference scheme that strengthens language conditioning without changing pretrained VLA architectures or weights. Experiments. CAG is effective across multiple dimensions of language grounding, consistently improving both language grounding and task success on under-observed tasks. #VLA #Robotics #Vision #Language
English
1
25
142
11K
Yu Fang retweetledi
Kyle Vedder
Kyle Vedder@KyleVedder·
my first PI project: we added memory! this is a step function capabilities unlock: 15 minute long multi-step tasks in novel environments, controlled by text prompting having run many of the evals, I legit think this is the GPT 2 moment for robotics
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
37
58
632
66.7K
Yu Fang retweetledi
Mingyu Ding
Mingyu Ding@dingmyu·
Introducing OHRA (One Hand to Rule Them All) — a canonical representation that unifies diverse dexterous robot hands into a shared space, enabling cross-hand policy transfer and up to 81.9% zero-shot generalization to unseen morphologies 🌐zhenyuwei2003.github.io/OHRA arxiv 2602.16712
Mingyu Ding tweet media
English
4
16
86
7.1K
Yu Fang retweetledi
Peter Tong
Peter Tong@TongPetersb·
Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]
Peter Tong tweet media
English
35
222
1.1K
208.6K
Yu Fang retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
Introducing FOFPred — a language-driven future optical flow prediction framework that enables improved robot control and video generation. Instead of reacting to motion, FOFPred predicts how motion will evolve — conditioned on natural language. 🌐 Project: fofpred.github.io 📄 Paper: arxiv.org/abs/2601.10781 💻 Code: github.com/SalesforceAIRe… 🤗 Model: huggingface.co/Salesforce/FOF… 🕹️ Demo: fofpred.salesforceresearch.ai 🧵[1/3]
English
1
4
13
2.8K
Yu Fang retweetledi
Wenlong Huang
Wenlong Huang@wenlong_huang·
What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 point-world.github.io from @Stanford @nvidia
English
23
227
1.3K
234.4K
Yu Fang
Yu Fang@yuffishh·
🎬Qualitative Results Check out our project page for more results!
Yu Fang tweet media
English
1
0
3
284
Yu Fang
Yu Fang@yuffishh·
🤖Robotic VLA Benefits from Joint Learning with Motion Image Diffusion We introduce joint learning with motion image diffusion that enhances VLA models with motion reasoning capabilities. 📄Paper: arxiv.org/abs/2512.18007 🌐Project: vla-motion.github.io Key Highlights 🧠Our method seamlessly augments VLA models with motion reasoning capabilities, while preserving their real-time inference efficiency. 🔎We present motion image diffusion using a DiT, providing dense pixel-level dynamic supervision that complements sparse action supervision. We show that the optical-flow-based motion images are the most effective representation for joint action-motion learning. 🎯We enhance π-series VLA models to achieve 97.5% average success on LIBERO and 58.0% on RoboTwin. #VLA #Robotics #Motion
English
6
51
459
25.4K