Max Fu

205 posts

Max Fu banner
Max Fu

Max Fu

@letian_fu

scaling robotics. Intern @NVIDIA. PhD student @UCBerkeley @berkeley_ai. Prev @Apple @autodesk

Berkeley, CA Katılım Ağustos 2012
652 Takip Edilen1.4K Takipçiler
Sabitlenmiş Tweet
Max Fu
Max Fu@letian_fu·
Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵
English
19
128
626
151.8K
Max Fu retweetledi
NVIDIA Robotics
NVIDIA Robotics@NVIDIARobotics·
Early access to NVIDIA Isaac GR00T N1.7 is here 🎉 — an open, commercially licensed vision-language-action foundation model for humanoid robots, built for real-world deployment. 🤗 Read the @huggingface blog: nvda.ws/4cBx63F 🤖 Models: nvda.ws/3OJVMyY 💻 Github: nvda.ws/4mz8WLP
English
16
98
605
68.1K
Max Fu retweetledi
David McAllister
David McAllister@davidrmcall·
We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.
English
7
33
292
30.1K
Max Fu retweetledi
Max Fu retweetledi
Long Lian
Long Lian@LongTonyLian·
Our parallel reasoning project ThreadWeaver is now open-sourced 🎉! Check out our Data Gen/SFT/RL recipe at github.com/facebookresear… In case you don't know, ThreadWeaver 🧵⚡️ is the first parallel reasoning method to achieve comparable reasoning performance to widely-used sequential long-CoT LLMs, with up to 3x speedup across 6 challenging tasks.
AK@_akhaliq

ThreadWeaver Adaptive Threading for Efficient Parallel Reasoning in Language Models

English
0
23
129
53.7K
Max Fu retweetledi
Stephen James
Stephen James@stepjamUK·
𝗙𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗰𝗮𝗻 𝗽𝗮𝘀𝘀 𝗹𝗮𝘄 𝗲𝘅𝗮𝗺𝘀. 𝗧𝗵𝗲𝘆 𝗰𝗮𝗻 𝘄𝗿𝗶𝘁𝗲 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗰𝗼𝗱𝗲. 𝗕𝘂𝘁 𝗮𝘀𝗸 𝘁𝗵𝗲𝗺 𝘁𝗼 𝘄𝗿𝗶𝘁𝗲 𝗮 𝗽𝗿𝗼𝗴𝗿𝗮𝗺 𝘁𝗵𝗮𝘁 𝗰𝗼𝗻𝘁𝗿𝗼𝗹𝘀 𝗮 𝗿𝗲𝗮𝗹 𝗿𝗼𝗯𝗼𝘁, 𝗮𝗻𝗱 𝘁𝗵𝗲𝘆 𝘀𝘁𝗶𝗹𝗹 𝗳𝗮𝗹𝗹 𝘀𝗵𝗼𝗿𝘁 𝗼𝗳 𝗮 𝗵𝘂𝗺𝗮𝗻 𝗲𝘅𝗽𝗲𝗿𝘁. That's the core finding from CaP-X, a new framework from NVIDIA, UC Berkeley, Stanford, and CMU that systematically benchmarks coding agents for robot manipulation. The underlying idea is not new. Code as Policy has been around since 2022/2023, and it is best understood as a modern evolution of Task and Motion Planning - a classical robotics paradigm where engineers manually decompose high-level goals into structured programs combining perception, planning, and control. What has changed is that instead of a human writing that code, a language model does it. It works well when the abstractions are high-level. It degrades significantly when models have to reason at the level human engineers actually work at: raw perception outputs, IK solvers, collision constraints. Here is what the research actually shows: 𝗧𝗵𝗲 𝗮𝗯𝘀𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 𝗴𝗮𝗽 𝗶𝘀 𝗿𝗲𝗮𝗹. Performance drops as you move from high-level primitives to low-level APIs. Not because the models lack intelligence, but because the scaffolding disappears. 𝗠𝘂𝗹𝘁𝗶-𝘁𝘂𝗿𝗻 𝗳𝗲𝗲𝗱𝗯𝗮𝗰𝗸 𝗿𝗲𝗰𝗼𝘃𝗲𝗿𝘀 𝗺𝗼𝘀𝘁 𝗼𝗳 𝘁𝗵𝗮𝘁 𝗹𝗼𝘀𝘀. Multi-turn feedback with execution traces and structured observations dramatically improves performance. Raw images alone actually hurt. 𝗥𝗟 𝗼𝗻 𝗮 𝘀𝗺𝗮𝗹𝗹 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗻𝘀𝗳𝗲𝗿𝘀 𝘇𝗲𝗿𝗼-𝘀𝗵𝗼𝘁 𝘁𝗼 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹 𝘄𝗼𝗿𝗹𝗱. A 7B model fine-tuned with RL in simulation transfers zero-shot to a real Franka robot by reasoning over structured APIs. The takeaway is simple. The bottleneck is not model size. It is the feedback loop, the abstraction layer, and the system around the model. Credit: @letian_fu, Justin Yu, Karim El-Refai, Ethan Kou, @HaoruXue, @DrJimFan, and the full team across @nvidia, @UCBerkeley, @Stanford, and @CMU_Robotics And of course @AGIBOTofficial for providing the hardware in the attached video! What do you think is holding Code as Policy back from production deployment? Paper link in comments.
English
4
3
22
5.2K
Max Fu retweetledi
Kush Hari
Kush Hari@KushtimusPrime·
Our new work, STITCH 2.0, can perform consecutive running sutures to close a sample wound with the daVinci robot.
English
7
15
59
25.1K
Max Fu
Max Fu@letian_fu·
I resonate with this framing a lot: robotics should be treated primarily as part of the pretraining problem, not a post-training or mid-training one. From roughly 2021 to 2024, some of the most exciting progress in robotics came from this lens: first through visual pretraining on egocentric human data (MVP, R3M, VC-1, etc.), then through robot trajectory pretraining (RPT, ICRT, etc.). Part of why these directions became quieter was not that the framing stopped being useful, but that VLM backbones began to dominate: they were pretrained on much larger-scale data and therefore offered stronger representations out of the box. In that sense, VLMs were a very useful scaffold in the low-robot-data regime, but that scaffold may become less central as robotics data scales.
Pete Florence@peteflorence

x.com/i/article/2041…

English
3
2
42
3.2K
Max Fu retweetledi
Generalist
Generalist@GeneralistAI·
Introducing GEN-1. Our latest milestone in scaling robot learning. We believe it to be the first general-purpose AI model to master simple physical tasks. 99% success rates, 3x faster speeds, adapts in real time to unexpected scenarios, w/ only 1 hour of robot data. More🧵👇
English
50
282
1.7K
363.3K
Max Fu retweetledi
Max Fu retweetledi
Max Fu retweetledi
Max Fu retweetledi
Max Fu
Max Fu@letian_fu·
I think the key is that the LLM does not need to generate joint-level actions at high frequency. Low-level feedback control and fast perception primitives can run independently at a high frequency, while the LLM can replan at a slower rate. The agent writes code that uses those primitives to specify waypoints, branching logic, and recovery behavior based on perceptual outputs, all based on the model's understanding of the task and how the robot should behave. In that sense, code-as-policy can operate at a higher level than raw motor control while still being highly reactive, and in some cases can be even faster/more reactive than VLA (i.e. dynamically update stiffness/impedance based on force feedback and subtask).
English
1
0
6
406
amv
amv@aryanmadhaverma·
had tried stitching IK functions as tool calls to llm agents a few months back where the agent plans these tool execution based on the vague tasks it has been given. happy to see it formalise into an actual system of operation for robots have a few doubts though. maybe I'm thinking wrong VLA policies output actions at 10+ hz. coding agents wont match this. even if you host on cerebras or groq at 1000+ tokens/sec, continuous reactive control needs more than fast generation. the agent would need to ingest current joint state, distance to target, sensor feedback, run a validation phase, potentially replan, all before the next action step. that loop is architecturally too slow for the kind of realtime adjustment that motor policies handle natively think pouring water into a glass where the water pours slightly outside the glass. you need a micro wrist correction in milliseconds. you can't pause to reflect and rewrite code. the reflect rewrite execution loop is wrong for continuous control. we probably need this loop baked into a VLA that itself acts as a tool for agentic planners where a hybrid agentic control system makes sense cap RL is the coolest part. the idea that an RL trained coding agent could one shot a controller for a task and if that generalizes across embodiments and linkages, not just task specific but for the whole robot, that would be REALLY COOL. we're not there yet (right now it's trained on clean ground truth state, specific tasks), real perception and control will be noisy and constructing the reward and attributing the error to the correct input will be a good problem to solve though really bullish on llm agents for high level task decomposition, long horizon planning and zero shotting new instructions which can be autocorrected later
Max Fu@letian_fu

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵

English
2
0
9
1.9K
Max Fu retweetledi
Max Fu retweetledi
Max Fu retweetledi
Wenli Xiao
Wenli Xiao@_wenlixiao·
One thing I learned working on CaP-X: today's VLMs already have rich zero-shot capabilities that we roboticists keep losing when we distill them into VLAs. Giving models the right robotic MCP/CLI + harness engineering + test-time compute recovers a surprising amount of that. Maybe the next leap in robot deployment isn't a bigger policy -> it's a better coding agent. 🦞
Max Fu@letian_fu

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵

English
1
7
65
8.9K
Max Fu retweetledi
Max Fu retweetledi
Wenlong Huang
Wenlong Huang@wenlong_huang·
Excited to see techniques we developed back in 2021/2022 remain at the frontier for generalization, now with the latest LLMs/VLMs: task decomposition (LMs as zero-shot planners), structured environment feedbacks (Inner Monologue), and hierarchical code generation (Code as Policies). Would be very interesting time to revisit how test-time planning can generate novel behaviors with task representations synthesized by more powerful LLMs/VLMs (e.g., potential maps or constraints from VoxPoser & ReKep).
Max Fu@letian_fu

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵

English
1
6
67
6.1K