Furong Huang

2.1K posts

Furong Huang banner
Furong Huang

Furong Huang

@furongh

Associate professor of @umdcs @umiacs @ml_umd at UMD. Researcher in #AI/#ML, AI #Alignment, #RLHF, #Trustworthy ML, #EthicalAI, AI #Democratization, AI for ALL.

College Park, MD Katılım Eylül 2010
2.6K Takip Edilen10.3K Takipçiler
Sabitlenmiş Tweet
Furong Huang
Furong Huang@furongh·
Last month we booted up our robotics lab (see the boot‑up story here 👉x.com/furongh/status…). Today: TraceGen is out — our first product. We’ve been chasing a simple idea with big consequences: the small‑data problem in robotics is really a variation problem. Different bodies, cameras, and scenes fragment experience. So rather than learn the look of the world, learn the shared, scene‑centric 3D structure of motion—the where + how that transfers across embodiments. This is why we train on web‑scale video across embodiments (human↔robot, robot↔robot; new cameras, new scenes) to build a transferable motion prior. Focus on geometry that matters for manipulation; treat appearance as incidental. Variation stops being a tax and becomes fuel. Alongside the release, we’re opening traceforge—our dataset and tooling for working in this “trace” view—so others can reuse in‑the‑wild video without wrestling pixels or prose. If our vision for physical intelligence resonated with you—structure over surface, reuse over recollect—we’d love feedback, bugs, and collaborations. 🌐 Website: tracegen.github.io 🔧 Data and Tooling: TraceForge and TraceGen 📄 Paper: arxiv.org/abs/2511.21690 See more details here: x.com/JayLEE_0301/st… #EmbodiedAI #RobotLearning #WorldModels #CrossEmbodiment
Furong Huang@furongh

I’m so lucky to have such amazing students! 🤩 🦾🧑‍🎓

English
4
11
150
29.9K
Furong Huang
Furong Huang@furongh·
AI agents can act. But they still struggle to judge their own actions. Most training teaches agents what to do, not why an action is better than another. We propose ACT – Agentic Critical Training, a new way to train agents to evaluate decisions. 🧵 Project: attention-is-all-i-need.github.io/ACT/ Paper: arxiv.org/abs/2603.08706
Furong Huang tweet media
English
26
56
259
13.5K
Furong Huang
Furong Huang@furongh·
Excited that our project HomeGraph in collaboration with @tomgoldsteincs has been selected for the NVIDIA Academic Grant Program! We’re building tool-native GR00T humanoid robots using a unified scene–skill graph for long-horizon household autonomy. Grateful for NVIDIA’s support with RTX PRO 6000 Blackwell GPUs and Jetson AGX Thor to push this research forward. Special shoutout to @ruijie_zheng12, now at NVIDIA’s GEAR Lab, who played a major role in early GR00T work while in my lab. Excited to continue collaborating. Looking forward to building on our partnership with @DrJimFan and @yukez and advancing the future of generalist robotics. Research supported by the NVIDIA Academic Grant Program. #NVIDIAGrant @NVIDIAAIDev
Furong Huang tweet media
English
3
9
53
4K
Furong Huang
Furong Huang@furongh·
[8/n] As agents become more autonomous, this capability becomes essential: • detecting mistakes • comparing strategies • revising plans These are fundamentally judgment problems.
Furong Huang tweet media
English
1
1
2
400
Furong Huang retweetledi
UMD Department of Computer Science
🤖 @UofMaryland CS Ph.D. student Seungjae “Jay” Lee is studying how large-scale human data can help robots perform everyday household tasks. His work explores ways to improve reliability as robots move from lab demonstrations to real homes. Read more: go.umd.edu/Lee-3-2026
UMD Department of Computer Science tweet media
English
0
1
4
670
Dylan Sam
Dylan Sam@dylanjsam·
I defended my PhD thesis! Also, a very (~4 month) late life update, but I've joined @OpenAI to work on safety research and pretraining safer language models! 📈 Thank you to my advisor @zicokolter and my committee: Matt Fredrikson, @andrew_ilyas, and @furongh! 🙏
Dylan Sam tweet media
English
25
8
218
20.5K
Furong Huang
Furong Huang@furongh·
Back when working on FLARE: Robot Learning with Implicit World Modeling 📄 x.com/furongh/status… We realized something important: 👉 Co-training is not just a trick. It’s a scaling law for robotics. By aligning latent future representations, FLARE showed that mixing robot demonstrations with human egocentric video unlocks surprising generalization — even to unseen objects with minimal robot data. That insight stayed with us. Now Ruijie has graduated from our lab and joined NVIDIA GEAR Lab — one of the frontier labs in modern robotics. And they’re taking this idea further. Why is co-training powerful? • Robot data provides precise action grounding • Human video provides massive visual diversity • Latent alignment bridges embodiment gaps You don’t need perfect action labels. You need the right representation. The next generation of VLAs will not just react — they will anticipate. Proud former advisor moment 🚀 #Robotics #WorldModels #VLA #EmbodiedAI #DiffusionModels
Ruijie Zheng@ruijie_zheng12

Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇

English
1
8
70
10.1K
Furong Huang retweetledi
Amrit Singh Bedi
Amrit Singh Bedi@amritsinghbedi3·
🧵 Reasoning models may be easier to jailbreak. But safety recovery is easier than you think - just a few steering steps away. Check our new results and insights arxiv.org/pdf/2602.11096
Souradip Chakraborty @ Neurips 2025@SOURADIPCHAKR18

🚫 #Reasoningmodels improve AI capabilities (IMO, Olympiad), but degrade #Safety #Alignment ❓ Are we doomed? 📢 Safety recovery is easier than you think (just a few steering steps away) Surprisingly simple safety recovery maintaining utility of MLRMs: arxiv.org/pdf/2602.11096

English
1
6
10
1.4K
Furong Huang
Furong Huang@furongh·
MomaGraph has been selected as #ICLR2026 Oral! Kudos to all the co-authors!
Furong Huang@furongh

This is a really crisp articulation of what “embodied intelligence” has been missing: a task-faithful interface between pixels and plans. For years we have argued about end-to-end policies vs modular pipelines, VLM planners vs classical task planning, “3D scene understanding” vs “affordances”. But the real bottleneck is simpler: **Robots fail in homes not because they can’t see, but because they can’t commit to the right structure of what they saw.** **Why this matters** A household is not a static 3D reconstruction problem. It is a stateful and interactive world: •“Where” matters (spatial relations, occlusions, reachability), •“How” matters (affordances, parts, functional constraints), •and “What changed” matters most (open/closed, filled/empty, on/off, moved/blocked). Most existing “scene graphs” choose one axis: •spatial graphs: geometry-rich, action-poor •functional graphs: affordance-rich, geometry-weak MomaGraph’s key move is to unify both and make state first-class, with part-level interactive nodes. That’s not just a better representation — it’s the right abstraction layer for embodied reasoning. **Graph-then-Plan is a field-defining direction** The “Graph-then-Plan” paradigm is more than a technique – it’s a thesis: Stop asking a VLM to hallucinate a plan directly from pixels. Force it to externalize the relevant world model first. This is exactly how we make VLM-based agents: •more grounded (reduce free-form hallucination), •more auditable (the graph is inspectable, editable, debuggable), •more composable (graphs can be reused across tasks, skills, and time), •more trainable (reward the intermediate structure, not just the final answer). I also like the RL angle (MomaGraph-R1 on top of a 7B VLM): it suggests a practical recipe for future embodied foundation models: 1.learn a structured latent that matches the environment’s causal affordances 2.learn planning on top of that latent 3.evaluate both separately, then jointly **Datasets and benchmarks are the leverage** Releasing MomaGraph-Scenes + MomaGraph-Bench is arguably as important as the model: •If we want progress, we need standardized targets for what structure is “correct” in a household. •The six capability axes (from fine-grained affordance reasoning to long-horizon decomposition) is exactly the right shape of benchmark for embodied VLMs. **The big picture** If we zoom out, this is part of a broader convergence: Embodied AI is becoming representation learning again — but not “representation” as a hidden vector. Representation as a contract: •between perception and action, •between language and physics, •between what the agent believes now and what it will remember later. In that view, MomaGraph is a step toward a future where robots carry persistent, state-aware, task-conditioned world models that can be updated, queried, and reasoned over – not just prompted. Very excited to see where this goes, especially as we push toward: •temporal graphs (state updates as events), •uncertainty-aware graphs (confidence as a first-class signal), •active perception (ask-for-views to resolve graph ambiguity), •and lifelong memory (graphs as the substrate of agent memory in real homes). Kudos to the team – this feels like the kind of work that doesn’t just improve a leaderboard, it clarifies the roadmap.

English
1
5
29
4.7K