Pietro Vitiello

160 posts

Pietro Vitiello banner
Pietro Vitiello

Pietro Vitiello

@pitvit_

PhD student in the Robot Learning Lab at @imperialcollege

London 가입일 Temmuz 2023
483 팔로잉265 팔로워
Pietro Vitiello 리트윗함
Alper Canberk
Alper Canberk@alpercanbe·
i was visiting a hackathon where 80+ participants were training pi0/0.5, gr00t, smolvla, ACT, DP, etc. on lerobot arms the best and most sample efficient policies were trained *from scratch* we still do not have an open source x-embodied GPT-2, but i'm hopeful for this year
English
16
13
332
89K
Pietro Vitiello 리트윗함
Sharpa
Sharpa@SharpaRobotics·
Here’s the windmill assembly demo we showed at CES 2026 — the one no one saw coming. North executes a fully autonomous, long-horizon dexterous sequence with sustained hand–eye–tactile coordination and assembly-level precision enabled by tactile feedback. It’s also robust to disturbance: you can reposition the objects, and North will still identify them and recover the task. This is powered by CraftNet (VTLA) — using tactile feedback to continuously fine-tune the last-millimeter interaction, enabling reliable execution across 30+ steps. Read more about CraftNet: sharpa.com/blogs/news/sha… #Sharpa #SharpaWave #SharpaNorth #CraftNet #System0 #CES2026
English
46
187
932
87.4K
Pietro Vitiello 리트윗함
Jim Fan
Jim Fan@DrJimFan·
I'm on a singular mission to solve the Physical Turing Test for robotics. It's the next, or perhaps THE last grand challenge of AI. Super-intelligence in text strings will win a Nobel prize before we have chimpanzee-intelligence in agility & dexterity. Moravec's paradox is a curse to be broken, a wall to be torn down. Nothing can stand between humanity and exponential physical productivity on this planet, and perhaps some day on planets beyond. We started a small lab at NVIDIA and grew to 30 strong very recently. The team punches way above its weight. Our research footprint spans foundation models, world models, embodied reasoning, simulation, whole-body control, and many flavors of RL - basically the full stack of robot learning. This year, we launched: - GR00T VLA (vision-language-action) foundation models: open-sourced N1 in Mar, N1.5 in June, and N1.6 this month; - GR00T Dreams: video world model for scaling synthetic data; - SONIC: humanoid whole-body control foundation model; - RL post-training for VLAs and RL recipes for sim2real. These wouldn't have been possible without the numerous collaborating teams at NVIDIA, strong leadership support, and coauthors from university labs. Thank you all for believing in the mission. Thread on the gallery of milestones:
Jim Fan tweet media
English
112
220
2.2K
401.4K
Pietro Vitiello 리트윗함
Dr Singularity
Dr Singularity@Dr_Singularity·
Robot can now learn 1000 manipulation tasks in a day, and we're still in 2025. "Researchers at the Robot Learning Lab at Imperial College London recently developed a new imitation learning approach that could allow robots to successfully learn new tasks faster and without requiring substantial training data." "Using this method, which was introduced in a paper published in Science Robotics, they were able to train a robotic arm to complete 1,000 different tasks in a single day."
Dr Singularity tweet media
English
10
34
155
12.1K
Pietro Vitiello 리트윗함
Thomas Kipf
Thomas Kipf@tkipf·
So excited to finally talk about this work! Veo is a surprisingly strong world simulator. We fine-tuned Veo on action-conditioned, multi-view robotics data. Key result: running a policy in the world model is strongly correlated with real-world results. A few important take-aways: 1) Veo Robotics models real-world physics and robot interactions 2) The base model's world knowledge is retained after fine-tuning and can model OOD scenarios not seen in the robotics data 3) The world model can be used to score task success or failure for a given policy 4) This proves useful for predictive red teaming: simulate dangerous or rare scenarios that would be difficult or irresponsible to execute on the real robot, and judge its performance I couldn't be more excited about where generalist video models are headed.
Anirudha Majumdar@Majumdar_Ani

Generalist robots need a generalist evaluator. But how do you test safety without breaking things? 💥 🌎 Introducing our new work from @GoogleDeepMind: Evaluating Gemini Robotics Policies in a Veo World Simulator veo-robotics.github.io 🧵👇

English
15
29
227
44.4K
Pietro Vitiello 리트윗함
Ted Xiao
Ted Xiao@xiao_ted·
Scalable evaluation is one of the most challenging problems in robotics today. Expensive + slow real-world deployments have been the only gold standard. But now, generative video models can provide *predictive and scalable* evaluation signal for real-world robots!🌎
Anirudha Majumdar@Majumdar_Ani

Generalist robots need a generalist evaluator. But how do you test safety without breaking things? 💥 🌎 Introducing our new work from @GoogleDeepMind: Evaluating Gemini Robotics Policies in a Veo World Simulator veo-robotics.github.io 🧵👇

English
1
4
57
5.3K
Pietro Vitiello 리트윗함
Chris Paxton
Chris Paxton@chris_j_paxton·
Okay let's talk about this. What happens: - operator forgets to "clutch out" and tell the robot to stop tracking his hands - the robot hands go up to its head - the robot doesnt move its legs to stay balanced - which seems to imply its not doing whole body control like we would expect - he turns off the robot while its in this unbalanced state, and its arms go to a neutral position close to where the video started-- but because the arms are out of position it makes a huge, fast movement and absolutely smashes that water bottle, sending water spraying everywhere A lot of people are blaming the operator here but i think forgetting to press a button is, again, pretty human, and you could build a much more robust system here. No reason imo this robot should have fallen.
English
77
34
747
219.8K
Pietro Vitiello
Pietro Vitiello@pitvit_·
That’s exactly right, the infrastructure for handling data, training models and setting up robots is immense and the more ambitious the project the worse it gets. Amazing initiative from @Neuracore_AI and @stepjamUK!
Stephen James@stepjamUK

𝗧𝗵𝗶𝘀 𝗶𝘀 𝗲𝘅𝗮𝗰𝘁𝗹𝘆 𝘄𝗵𝘆 𝘄𝗲'𝗿𝗲 𝗴𝗶𝘃𝗶𝗻𝗴 𝗳𝗿𝗲𝗲 𝗮𝗰𝗰𝗲𝘀𝘀 𝘁𝗼 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗮. Imperial’s Robot Learning Lab published some remarkable work last month: teaching a robot 1,000 manipulation tasks in 24 hours from a single demonstration each. What the paper doesn’t spotlight is the hidden cost behind that result: 5,650+ real-world rollouts and the massive infrastructure required to support them - data collection, sensor sync, format wrangling, storage, and training. At Neuracore, we see brilliant researchers spending months building pipelines instead of advancing what robots can learn. We see PhD students writing their 15th format converter instead of running their next experiment. 𝗧𝗵𝗮𝘁’𝘀 𝘄𝗵𝘆 𝘄𝗲’𝗿𝗲 𝗼𝗳𝗳𝗲𝗿𝗶𝗻𝗴 𝗳𝗿𝗲𝗲 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝗮𝗰𝗰𝗲𝘀𝘀 𝘁𝗼 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗰 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗹𝗮𝗯𝘀. Not as a giveaway, but rather 𝗮𝘀 𝗮𝗻 𝗶𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗲𝗰𝗼𝘀𝘆𝘀𝘁𝗲𝗺: • Researchers focus on algorithms, not infrastructure • Breakthroughs happen faster when data → policy takes days, not months • Students graduate with battle-tested workflows • Strong academic validation accelerates industry adoption Work like this shows the future of robot learning: extreme data efficiency. Our job is to provide the infrastructure that makes discovering the next MT3 frictionless. If your lab is working on imitation learning, manipulation, or embodied AI, let’s talk. Credit: @imperialcollege @Kamil__Dre @pitvit_ @vitalisvos19 @Ed__Johns Paper: robot-learning.uk/learning-1000-…

English
0
0
1
165
Pietro Vitiello 리트윗함
Jiaming Tang
Jiaming Tang@jmtang42·
Even large VLAs can play ping-pong in real time! 🏓⚡️ In practice, VLAs struggle with fast, dynamic tasks: • slow reactions, jittery actions. • demos often shown at 5-10× speed to look “smooth”. We introduce VLASH: • future-state-aware asynchronous inference with >30Hz inference frequency for PI0.5 • drop-in to existing VLAs with no extra overhead • enables PI0.5 / PI0 to play ping-pong and other highly dynamic tasks in real time 📄 Paper: arxiv.org/abs/2512.01031 🔧 Code: github.com/mit-han-lab/vl…
English
18
82
441
70.4K
Pietro Vitiello 리트윗함
Xiao Ma
Xiao Ma@yusufma555·
I've been working on deformable object manipulation since my PhD. It was totally a nightmare years ago and my PhD advisor was telling me not to work on it for my own good. Today, at ByteDance Seed, we are dropping GR-RL, a new VLA+RL system that manages long-horizon precise dexterous manipulation of deformable objects. This is probably the first real-world RL system to make a robot: ✅ Lace up your shoes end to end ✅ Hit millimeter tolerance repeatedly ✅ Recover from mistakes (See video!) ✅ And complete continuous shoelace threading on a real bimanual platform 📈 Success rate: ↑ from 45.7% → 83.3% Yes, robots can now actually do this. Project page: seed.bytedance.com/en/gr_rl ArXiv: arxiv.org/abs/2512.01801
English
37
143
940
108.8K
Pietro Vitiello 리트윗함
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
“The thing that happened with AGI and pretraining is that in some sense they overshot the target. You will realize that a human being is not an AGI. Because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. If I produce a super intelligent 15 -year -old, they don't know very much at all. A great student, very eager. [You can say,] ‘You go and be a programmer. You go and be a doctor. Go and learn.’ So you could imagine that the deployment itself will involve some kind of a learning trial and error period. It's a process as opposed to, you drop the finished thing.” @ilyasut
Dwarkesh Patel@dwarkesh_sp

The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!

English
95
163
1.9K
592.8K
Pietro Vitiello 리트윗함
Ilir Aliu
Ilir Aliu@IlirAliu_·
The usual deep tech mistakes: • Talk to 5 customers, not 50 • Build the “perfect” product before selling • Stay in one country and accept 18-month cycles • Treat deployment as an afterthought The @in_bolt founders built by rejecting all of this.
English
1
7
57
8.9K
Pietro Vitiello 리트윗함
Sebastien Bubeck
Sebastien Bubeck@SebastienBubeck·
3 years ago we could showcase AI's frontier w. a unicorn drawing. Today we do so w. AI outputs touching the scientific frontier: cdn.openai.com/pdf/4a25f921-e… Use the doc to judge for yourself the status of AI-aided science acceleration, and hopefully be inspired by a couple examples!
Sebastien Bubeck tweet media
English
74
206
1.3K
1.5M
Pietro Vitiello 리트윗함
Jack 🤖
Jack 🤖@JacklouisP·
Selling 100 robots is hard. Maintaining them across 10 sites is harder. Who fixes it at 2 AM? Where are spare parts? Remote updates? Different firmware versions? Service infrastructure costs more than R&D.
English
5
5
26
1.1K
Pietro Vitiello
Pietro Vitiello@pitvit_·
I believe that in terms of navigation the policy predicts something similar to a velocity or a waypoint. If you change hardware (hw) you should be able to use the same policy output as long as you have a controller for that hw. A controller is by no means easy to design, but much less of a problem than having to retrain the policy with new data. In terms of manipulation, A humanoid would mean different torso kinematics, but you should be able to simply adjust the kinematics chain/controller and reuse the policy outputs here too. Only problem would be if the body of the robot is often in the camera view. In that case the policy would be seeing something different with a different hw. This being said they were able to map human to robot already (maybe with some kind of masking or impainting) so that shouldn't be a problem. These are only my opinions though. I could be wrong
English
0
0
1
22
Koala
Koala@BrainyMarsupial·
Good points, so you think the data they acquire would be readily generalizable to any humanoid they might end up developing in the future? If that's the case, that might be very advantageous then. They are using a cheaper design that's more family-friendly and home-ready relative to humanoids. Nice potential flywheel for data accumulation.
English
1
0
1
26
Pietro Vitiello
Pietro Vitiello@pitvit_·
I'm incredibly impressed. I'd say this is the best home robotics demo we've seen ... by a mile. I was a bit sceptical of the gripper design, but actually, after seeing it together with the human hand, it makes sense. It actually seems to be providing the necessary dexterity without the over complexity of the 3 million degrees of freedom humanoid hands that some people are going for. You could argue that objects could have been placed always basically in the same spot. But that would be unfounded, with that much navigation the robot is bound to be in a different relative pose to the objects regardless. Great demo! Keep it up
Tony Zhao@tonyzzhao

Today, we present a step-change in robotic AI @sundayrobotics. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

English
2
6
100
14.5K
Pietro Vitiello 리트윗함
Jorge Bravo Abad
Jorge Bravo Abad@bravo_abad·
Learning a thousand tasks in a day Imagine teaching a robot 1,000 different tricks in a single day—each from just one human demo. That’s the promise behind new work by Kamil Dreczkowski and coauthors: moving robot learning a bit closer to how humans pick up new skills quickly, instead of needing thousands of repetitions per task. The authors take a very pragmatic route. Instead of one giant end-to-end model, they break manipulation into two simple phases. First, the robot figures out how to line up its gripper with the right object and pose (using language and geometry to find the closest past demo, then classic pose estimation and motion planning). Second, once it’s in the right place, it replays the fine-grained motion of the original demonstration in the gripper’s own coordinate system. With that recipe, they get a real robot to perform around 1,000 distinct tasks on more than 400 objects, with just one demonstration per task, collected in under a day. What I like here is the message: you don’t always need a bigger model, you need the right structure. By separating “where to go” from “how to move once you’re there,” MT3 becomes data-efficient, interpretable, and easier to debug—and in the low-data regime, it outperforms standard imitation learning baselines. For those of us dreaming about robots that set up experiments, handle samples, or reconfigure lab equipment almost on demand, this kind of approach—smart inductive bias plus minimal supervision—looks like a very interesting step forward. Paper: science.org/doi/10.1126/sc…
Edward Johns@Ed__Johns

I'm very excited to finally announce one of the most ambitious projects we've worked on — which makes the front cover of Science Robotics today: ☀️ Learning a Thousand Tasks in a Day ⭐️ Everyday tasks — like those below — can now be learned from a single demonstration each...

English
1
3
8
4.2K
Pietro Vitiello
Pietro Vitiello@pitvit_·
This is a fair doubt. Personally I think this is a good idea. I agree that one of the few places where I see the humanoid form factor being a good choice is the home. This being said, I believe that in an industry such as robotics, the most pressing matter is the usefulness of the robot. Aside from staircases, this form factor is very effective and has much easier control compared to a legged humanoid. Not having to worry about legs means more time spent on solving actual tasks, potentially shortening the time-to-market. If I am right, this means realizing revenues faster and deploying robots earlier than others, ultimately getting even more data. Once you have proven your worth and potentially secured a few customers you can design a humanoid alternative if you really deem it necessary. Bear in mind that the policy would probably transfer no matter what the lower half of the robot is, the main difference would be changing the controller and kinematics. The biggest problem would surely be designing/manufacturing the humanoid itself though.
English
1
0
1
61
Koala
Koala@BrainyMarsupial·
It’s fantastic but I worry about what the ceiling for a design like this is. Certainly these demos are remarkable, but I feel like humanoids have way more room for exponentiality. But yeah if robot progress magically went to a halt today Sunday would be near the top in terms of practical form factor. Very promising company.
English
1
0
1
90
Pietro Vitiello
Pietro Vitiello@pitvit_·
After taking a look at the blog post I should add something: They condition their policy on a map of the environment the robot is being deployed in. I still don’t think they are using dense tactile sensors but they mention the importance of force when manipulating some objects so they could be using force sensors.
English
0
0
2
107
Pietro Vitiello
Pietro Vitiello@pitvit_·
I think it mainly is, yes. The hand seems a bit too stiff to have any tactile sensor and honestly I have the impression the team is being very mindful of all complexity that they add to the pipeline. Having tactile at this point would prove little benefit while making the learning problem potentially harder. I also don't think they are using depth. 4 out of 5 cameras are on the wrist, where depth is notoriously bad. The head-cam could be, but I seriously doubt it. I do however think they have some proprioception. Probably just where the hands are relative to each other or to the head cam. I also think there is language as a potential input too. Seeing such a long horizon task makes me think there is an internal VLM that comes up with a plan and passes each stage of that plan to a lower level policy to complete. Similar to @Figure_robot and @physical_int
English
2
1
6
530
Pietro Vitiello 리트윗함
Benedict Quartey
Benedict Quartey@Benedict_Q·
Maps will always be important for deploying robots in the world! Very interesting to see @sundayrobotics condition their policies on maps. One of the cool works I spoke of during my thesis talk was our way of doing this kind of navigation in service of long-horizon manipulation without needing to learn but still inheriting the open world priors of Foundation models. Excited to see more of this in the space!
Tony Zhao@tonyzzhao

It is even more fun to see how Memo reacts to unseen environments. We deploy it to 6 unseen Airbnbs and task the robot with fine-grained tasks such as picking up utensils from the plate. Because we train on data from over 500 homes, the new home is instantly familiar to Memo.

English
4
19
126
30.2K