Jack Vial

1.2K posts

Jack Vial

@jackvial89

Building robots and neural nets

Ohio, USA Katılım Şubat 2011

2K Takip Edilen1.5K Takipçiler

Sabitlenmiş Tweet

Jack Vial@jackvial89·2 Mar

Distributed Real-Time Chunking! I've written a technical blog post on the approach to deploying Real-Time Chunking via In-Painting on a remote cloud GPU server with local client (e.g. Raspberry PI) demonstrated in the video below jackvial.com/posts/distribu… A LeRobot based implementation is available at github.com/jackvial/drtc this includes scripts to provision a GPU instance on Prime Intellect, connect via Tailscale, and should have everything (expect a model trained for your environment) needed to reproduce the experiments outlined in the blog post

Jack Vial@jackvial89

much smoother after applying a low pass Butterworth filter with a 3hz cutoff. this filters out high frequency small movements (aka noise) in the trajectory, it also naturally attenuates the signal so makes the robot move slower, I’ve added a bit of gain back to compensate and speed it up again after the filtering. still a bit jiggly, mainly in the shoulder pan but it seems to be mostly mechanical at this point

English

253

25.2K

Jack Vial@jackvial89·1d

@nickfrosst @cohere @huggingface bodhrán deas!

191

Nick Frosst@nickfrosst·1d

@cohere transcribe Sota open source transcription model running in the browser :) Weights on @huggingface link below

English

114.8K

Jack Vial@jackvial89·4d

@GPTJustin Thank you!

English

Justin Strong@GPTJustin·4d

@jackvial89 This is so cool! Looking forward to seeing the results

English

Jack Vial@jackvial89·6d

The π*0.6 RECAP value network training is starting to look good! It's only about 15 minutes into training. I spent most of last week studying the paper and making notes on the value network and advantage conditioning. I'm training on a 40 episode pick and place dataset for so101. 20 are successful, 20 I purposely failed trying to imitate some of the model failure modes I have seen.

Jack Vial@jackvial89

i'm working on an implementation of π*0.6 RECAP. going to start with a simplified version of the full pipeline

English

116

10.8K

Jack Vial@jackvial89·5d

@KyleVedder thank you, will give that a try

English

Kyle Vedder@KyleVedder·5d

@jackvial89 you should try a small bin count (like 5 or 10) and see if advantages are still meaningful at the data scale you’re at it feels like you want the lowest advantage resolution you can get away with to avoid overfitting

English

Jack Vial@jackvial89·5d

@VilleKuosmanen @shumochu @pravsels Thanks I have a remote/distributed version of RTC that can run at 60hz or more (bottlenecked by my so101 servos). Once I get the advantage conditioning working I plan to work on rollout recording and human intervention

English

Ville🤖@VilleKuosmanen·5d

@jackvial89 @shumochu @pravsels thanks! will be sharing updates. hope you get all the infra done on your side too

English

Ville🤖@VilleKuosmanen·5d

"RL Token" looks like a great and surprisingly simple post-training methodology for optimising robot models for dexterous tasks in the real world! Over the next few weeks, me and @pravsels will be attempting to reproduce the results (& open source the code) Stay tuned 👀

Physical Intelligence@physical_int

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

English

7.8K

Jack Vial@jackvial89·5d

@VilleKuosmanen @shumochu @pravsels Ahh I see! Sounds like you have a good foundation to build on. I enjoyed your article and looking forward to following your progress with RTL

English

Ville🤖@VilleKuosmanen·5d

@jackvial89 @shumochu @pravsels yes we have had most RECAP infra working since December just not applied to pi0.5 yet villekuosmanen.medium.com/does-learning-…

English

Jack Vial@jackvial89·5d

@shumochu @VilleKuosmanen @pravsels For long horizon tasks combining both RECAP and RTL would likely give the best results

English

Jack Vial@jackvial89·5d

@shumochu @VilleKuosmanen @pravsels I think you can go straight to RTL on top of pi0.5 or similar VLA base. But I think a lot of the infra for RECAP and RTL is the same, async inference with RTC + human intervention, and then RTL will additional have the actor and critic

English

Jack Vial@jackvial89·5d

@realXiuyuYang yes, will post an update here when I make it available

English

YangXiuyu@realXiuyuYang·6d

@jackvial89 Wonderful! Will you release the codes on github？

English

104

Jack Vial@jackvial89·5d

@chrisdotai Good question, I haven't look at TOPReward or reward models much. I like RECAP because it's based on RL and it's a good incremental step toward methods like RTL

English

Chris@chrisdotai·5d

@jackvial89 I wonder how it compares to TOPReward

English

Jack Vial@jackvial89·5d

I'm using smolvla as the backbone with the HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights, then I fine tune it on my 40 episode dataset (frozen vision encoder everything else trainable). I have a pi0.5 version too and that's my main focus but started with smolvla as it fits on my local RTX 4070 and is a bit faster for me to iterate on atm and sanity check that I have the value and loss function correct. Since the episodes are quite short (12 seconds) I'm using 56 bins for the distributional value function target but lower bin count would probably be fine

English

180

Kyle Vedder@KyleVedder·6d

@jackvial89 what’s your setup? what are you using to init the value function weights?

English

317

Jack Vial@jackvial89·6d

@DibblaX this is early in training so it’s still learning and gradually getting closer to the red line. Also I’m using a bin size of 56

English

145

Yinggan Xu@DibblaX·6d

@jackvial89 Wondering how did you get the stair-like value. I think the return-to-go is more like a straight line -> value model learns to generate the similar thing, so the green curve is pretty unexpected to me 👀

English

152

Jack Vial@jackvial89·6d

i updated my lerobot data studio app to support adding episode level binary reward labels

English

410

Jack Vial@jackvial89·6d

pretty happy that it so far seems to be able to tell the difference between a failure and success, although the training set is small and and validation even smaller, so probably will overfit to begin until I gather more data and make the model more robust but that's ok

English

392

Jack Vial@jackvial89·6d

here's a failed episode

English

431

Jack Vial@jackvial89·23 Mar

wife made chihuahuas

Português

173

Jack Vial@jackvial89·20 Mar

incredible results! very excited about this and it looks like it builds on a good chunk of the same substrate as π∗0.6/recap

Physical Intelligence@physical_int

English

865

Jack Vial@jackvial89·17 Mar

We heard and felt the shockwave from this meteor in Cleveland. It shook the whole house and sounded like something had hit the roof!

NWS Pittsburgh@NWSPittsburgh

One of our employees, Jared Rackley, caught this morning's meteor on camera from the Pittsburgh area.

English

393

Jack Vial@jackvial89·15 Mar

i should clarify what I mean by generalization, I mean in a pretty narrow sense as in with a pick a place task with say 50 samples does the reward reinforcement on 20 additional samples 10 success, 10 failure help the model generalize to more than a single position. thank you! if you can share the value network and advantage code that would be very useful to compare against

English

110

PRB@builds_robots·15 Mar

@jackvial89 We used pi0.5 so didnt pretrain with RECAP. It definitely makes it more robust than pi0.5 - we ran one full iteration. I am not sure you should expect generalization. I can DM the value network and advantage calculation code with you.

English

113

Jack Vial@jackvial89·15 Mar

i'm working on an implementation of π*0.6 RECAP. going to start with a simplified version of the full pipeline

English

226

21.4K

Keşfet

@nickfrosst @cohere @huggingface @GPTJustin @KyleVedder @VilleKuosmanen @shumochu @pravsels