Jack Vial

1.2K posts

Jack Vial banner
Jack Vial

Jack Vial

@jackvial89

Building robots and neural nets

Ohio, USA Katılım Şubat 2011
2K Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Jack Vial
Jack Vial@jackvial89·
Distributed Real-Time Chunking! I've written a technical blog post on the approach to deploying Real-Time Chunking via In-Painting on a remote cloud GPU server with local client (e.g. Raspberry PI) demonstrated in the video below jackvial.com/posts/distribu… A LeRobot based implementation is available at github.com/jackvial/drtc this includes scripts to provision a GPU instance on Prime Intellect, connect via Tailscale, and should have everything (expect a model trained for your environment) needed to reproduce the experiments outlined in the blog post
Jack Vial@jackvial89

much smoother after applying a low pass Butterworth filter with a 3hz cutoff. this filters out high frequency small movements (aka noise) in the trajectory, it also naturally attenuates the signal so makes the robot move slower, I’ve added a bit of gain back to compensate and speed it up again after the filtering. still a bit jiggly, mainly in the shoulder pan but it seems to be mostly mechanical at this point

English
4
19
253
25.2K
Nick Frosst
Nick Frosst@nickfrosst·
@cohere transcribe Sota open source transcription model running in the browser :) Weights on @huggingface link below
English
44
81
1K
114.8K
Jack Vial
Jack Vial@jackvial89·
The π*0.6 RECAP value network training is starting to look good! It's only about 15 minutes into training. I spent most of last week studying the paper and making notes on the value network and advantage conditioning. I'm training on a 40 episode pick and place dataset for so101. 20 are successful, 20 I purposely failed trying to imitate some of the model failure modes I have seen.
Jack Vial tweet media
Jack Vial@jackvial89

i'm working on an implementation of π*0.6 RECAP. going to start with a simplified version of the full pipeline

English
8
13
116
10.8K
Kyle Vedder
Kyle Vedder@KyleVedder·
@jackvial89 you should try a small bin count (like 5 or 10) and see if advantages are still meaningful at the data scale you’re at it feels like you want the lowest advantage resolution you can get away with to avoid overfitting
English
1
0
3
48
Jack Vial
Jack Vial@jackvial89·
@VilleKuosmanen @shumochu @pravsels Thanks I have a remote/distributed version of RTC that can run at 60hz or more (bottlenecked by my so101 servos). Once I get the advantage conditioning working I plan to work on rollout recording and human intervention
English
0
0
1
35
Ville🤖
Ville🤖@VilleKuosmanen·
"RL Token" looks like a great and surprisingly simple post-training methodology for optimising robot models for dexterous tasks in the real world! Over the next few weeks, me and @pravsels will be attempting to reproduce the results (& open source the code) Stay tuned 👀
Physical Intelligence@physical_int

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

English
8
4
86
7.8K
Jack Vial
Jack Vial@jackvial89·
@VilleKuosmanen @shumochu @pravsels Ahh I see! Sounds like you have a good foundation to build on. I enjoyed your article and looking forward to following your progress with RTL
English
1
0
1
41
Jack Vial
Jack Vial@jackvial89·
@shumochu @VilleKuosmanen @pravsels I think you can go straight to RTL on top of pi0.5 or similar VLA base. But I think a lot of the infra for RECAP and RTL is the same, async inference with RTC + human intervention, and then RTL will additional have the actor and critic
English
2
0
2
57
YangXiuyu
YangXiuyu@realXiuyuYang·
@jackvial89 Wonderful! Will you release the codes on github?
English
1
0
1
104
Jack Vial
Jack Vial@jackvial89·
@chrisdotai Good question, I haven't look at TOPReward or reward models much. I like RECAP because it's based on RL and it's a good incremental step toward methods like RTL
English
0
0
1
77
Chris
Chris@chrisdotai·
@jackvial89 I wonder how it compares to TOPReward
English
1
0
0
70
Jack Vial
Jack Vial@jackvial89·
I'm using smolvla as the backbone with the HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights, then I fine tune it on my 40 episode dataset (frozen vision encoder everything else trainable). I have a pi0.5 version too and that's my main focus but started with smolvla as it fits on my local RTX 4070 and is a bit faster for me to iterate on atm and sanity check that I have the value and loss function correct. Since the episodes are quite short (12 seconds) I'm using 56 bins for the distributional value function target but lower bin count would probably be fine
English
1
0
2
180
Kyle Vedder
Kyle Vedder@KyleVedder·
@jackvial89 what’s your setup? what are you using to init the value function weights?
English
1
0
3
317
Jack Vial
Jack Vial@jackvial89·
@DibblaX this is early in training so it’s still learning and gradually getting closer to the red line. Also I’m using a bin size of 56
English
1
0
1
145
Yinggan Xu
Yinggan Xu@DibblaX·
@jackvial89 Wondering how did you get the stair-like value. I think the return-to-go is more like a straight line -> value model learns to generate the similar thing, so the green curve is pretty unexpected to me 👀
English
1
0
0
152
Jack Vial
Jack Vial@jackvial89·
i updated my lerobot data studio app to support adding episode level binary reward labels
Jack Vial tweet media
English
1
0
2
410
Jack Vial
Jack Vial@jackvial89·
pretty happy that it so far seems to be able to tell the difference between a failure and success, although the training set is small and and validation even smaller, so probably will overfit to begin until I gather more data and make the model more robust but that's ok
English
0
0
5
392
Jack Vial
Jack Vial@jackvial89·
here's a failed episode
Jack Vial tweet media
English
0
0
2
431
Jack Vial
Jack Vial@jackvial89·
wife made chihuahuas
Jack Vial tweet media
Português
0
0
2
173
Jack Vial
Jack Vial@jackvial89·
i should clarify what I mean by generalization, I mean in a pretty narrow sense as in with a pick a place task with say 50 samples does the reward reinforcement on 20 additional samples 10 success, 10 failure help the model generalize to more than a single position. thank you! if you can share the value network and advantage code that would be very useful to compare against
English
1
0
0
110
PRB
PRB@builds_robots·
@jackvial89 We used pi0.5 so didnt pretrain with RECAP. It definitely makes it more robust than pi0.5 - we ran one full iteration. I am not sure you should expect generalization. I can DM the value network and advantage calculation code with you.
English
2
0
3
113
Jack Vial
Jack Vial@jackvial89·
i'm working on an implementation of π*0.6 RECAP. going to start with a simplified version of the full pipeline
Jack Vial tweet media
English
10
23
226
21.4K