mattt

45 posts

mattt

@maedmatt

simulation & RL for humanoids

Italy Katılım Eylül 2018

360 Takip Edilen45 Takipçiler

mattt@maedmatt·3d

@tom_doerr look at how we used it in HACK26, i think results are quite nice! github.com/maedmatt/Dream…

English

Tom Dörr@tom_doerr·4d

Kinematic motion diffusion for humans and robots github.com/nv-tlabs/kimodo

English

4.2K

mattt@maedmatt·9 Mar

@observie isn't IMU one of the biggest bottleneck between sim and real? Have you also deployed on the real robot?

English

David Bar@observie·9 Mar

Most RL locomotion examples let the actor (the policy network that runs on the real robot) observe two ground truths that are not directly measured by hardware: - linear velocity of the robot - projected gravity (i.e. orientation of the robot) The former can be inferred using a state estimator built using a small neural network trained to predict velocity, while the latter can be computed using Madgwick AHRS / Kalman filter. Alternatively, it kind of makes sense to let the actor network learn to extract whatever internal representation it needs directly from raw sensor data, instead of using hand-designed estimators. I removed base_lin_vel, similarly to @asimovinc's approach, as well as projected_gravity. Instead, I added the accelerometer data (which most RL examples do not seem to provide). I continue to give those ground truth variables to the critic as privileged info the actor can't see, which is known as an asymmetric actor-critic architecture. Advantages: 1. Should minimize the sim2real gap, as there are less external components whose results may differ between the sim and the hw 2. The actor can learn the interim representation that works better for the task, not necessarily those that we decided to infer for it 3. Less hand-tuned parameters At least in simulation this seems to work great. It might be luck, trivial or still plain wrong, but after 1500 iterations, the simulation reached the best run yet in terms of reward, lin/ang tracking, action std and more.

Asimov@asimovinc

x.com/i/article/2018…

English

122

11.6K

mattt@maedmatt·3 Mar

@ChongZitaZhang what's that spike?

English

C Zhang@ChongZitaZhang·3 Mar

my scaling law not gonna outrun cluster maintenance :(

C Zhang@ChongZitaZhang

Feeling the achievement 😃These rewards are tuned for anymal (quadruped) and tron1 (biped lower body), but directly transfer to G1 (full humanoid) without tuning.

English

1.6K

mattt@maedmatt·2 Mar

@ChongZitaZhang no penalty for close feet?

English

173

C Zhang@ChongZitaZhang·2 Mar

Feeling the achievement 😃These rewards are tuned for anymal (quadruped) and tron1 (biped lower body), but directly transfer to G1 (full humanoid) without tuning.

C Zhang@ChongZitaZhang

cannot make reward weights more beautiful

English

17.8K

mattt@maedmatt·1 Mar

@kazeoto_code good luck on the sim2real path, share videos!

English

210

風音コード@kazeoto_code·1 Mar

履歴情報有りの強化学習で生成したポリシーを用いて、実機で試したい階段と同寸法の階段をSim2Sim上でGo2が登り降りできるようになった！！ Sim2Realへ移行したい

日本語

8.8K

mattt@maedmatt·1 Mar

@ChongZitaZhang have you thought about crafting custom rules or skills to making the agent guess on important implementations?

English

C Zhang@ChongZitaZhang·1 Mar

day3: setup terrains and multigpu training -- smooth. then insane debugging. The agent writes many bugs that are hard to spot. e.g., when randomizing the armature of joints, it resamples a value (0.8~1.2x) based on the current armature, not default one. that soon escalates.

C Zhang@ChongZitaZhang

the vibe coding challenging day1 MDP setup day2 policy coded, need multiple rounds of human-agent interaction to fix things. Today won't be day 3, need to do other things, but I also just found many bad kp/kd and initialization designs in common G1 setup that hurts explroation.

English

6.2K

mattt@maedmatt·1 Mar

@NepYope hahahah, that’s a nice theory : )

English

Martino Russi@NepYope·28 Şub

if the department of war has access to pre-rlhf gpt weights then they can finetune it to do terrorist attacks and stuff, if every country does that then each AI gets smarter as a result for better defense/offense and maybe that's how we get to AGI if we dont wipe ourselves out

English

David Bar@observie·27 Şub

Lots of great stuff in here by @KyleMorgenstein > Low Kp = feedforward torque, not position tracking. Enables full exploration > Kp = max_torque / joint_RoM, D ≈ Kp/20 > High Kp pushes policy to torque limits, kills exploration > Train 2-5x past apparent convergence for smooth deployable policies > noise_std super important, must decrease and stabilize > Start with perfect sim, add randomization one factor at a time thehumanoid.ai/deployment-rea…

English

8.6K

mattt@maedmatt·27 Şub

@observie @KyleMorgenstein what’s the point of training 2-5x after convergence? if reward and policy are not changing (converged) then what’s changing ?

English

214

mattt retweetledi

dar@radbackwards·15 Şub

First day at Harvard!! My little NEO is so grown up now

English

320

11.7K

mattt@maedmatt·14 Şub

@alxfazio cool status line, mind sharing the gists?

English

568

alex fazio@alxfazio·14 Şub

anthropic is actively tryna destroy all your relationships. first free tokens at christmas now a 50% discount on fast mode around valentine’s day. i guess single men have a higher per user token consumption rate

English

366

18.6K

mattt@maedmatt·13 Şub

Meet my new friend Laro! I will be teaching him a lot of new things in the upcoming months :)

English

212

mattt@maedmatt·29 Oca

@ChongZitaZhang What’s indirect then?

English

C Zhang@ChongZitaZhang·29 Oca

*direct use of vlms

Chris Paxton@chris_j_paxton

@redstone_hong VLMs can't handle any of the problems that really make robotics hard

English

3.7K

mattt@maedmatt·26 Oca

@antirez @steipete This might be what you’re looking for! x.com/davidgelberg/s…

David Gelberg@davidgelberg

last week @steipete crashed our first @unicorn_mafia demo night to show us @openclaw

English

1.1K

antirez@antirez·26 Oca

@steipete That's nuts. I'm going to try it, but would like to understand better what it does. Is there any video available? I know I'm nearly 49, but the YouTube thing got me.

English

12.2K

Peter Steinberger 🦞@steipete·26 Oca

ZXX

174

2.4K

319.7K

mattt@maedmatt·24 Oca

@ChongZitaZhang Where can I apply?

English

358

mattt@maedmatt·24 Oca

@Jitesh_117 @Jitesh_117 are your dotfiles available?

English

533

jitesh💙@Jitesh_117·24 Oca

fzf is so GOATed man

English

214

10.9K

mattt@maedmatt·22 Oca

@alxfazio But do you need a hook for that? Why not a custom rule? You can toggle it with something like this: paths: "*.py, pyproject.toml"

English

alex fazio@alxfazio·21 Oca

cc hooks are seriously underrated. one thing i haven’t had time to fully explore yet, but i’m genuinely excited about, is pairing them with skills to gate edits behind the exact context claude needs to work effectively in a complex codebase, at the right moment for example i’ve got a folder packed with pydantic data models that absolutely require deep, highly specific domain knowledge to update correctly. with a pre-tool-use hook, i can intercept any attempt to use the edit/write tool on that folder and require a skill invocation first that way, either the main agent or a sub-agent can pull in the right domain rules and constraints before touching the code. this is super useful in large, complex codebases where different areas need very narrow, specialized knowledge for claude to be effective, and where dumping a pile of docs at planning time and hoping claude catches every nuance doesn’t always work

English

176

9.3K

mattt@maedmatt·18 Oca

@jackvial89 What's controlling the robot? Are you filtering the actions predicted by the policy?

English

Jack Vial@jackvial89·18 Oca

trajectories before and after apply the low pass filtering you can very clearly see one is smoother than the other

English

1.1K

Jack Vial@jackvial89·18 Oca

much smoother after applying a low pass Butterworth filter with a 3hz cutoff. this filters out high frequency small movements (aka noise) in the trajectory, it also naturally attenuates the signal so makes the robot move slower, I’ve added a bit of gain back to compensate and speed it up again after the filtering. still a bit jiggly, mainly in the shoulder pan but it seems to be mostly mechanical at this point

Jack Vial@jackvial89

60hz! real time chunking on an so101 with LeRobot. Not looking too bad. Bit of jitter throughout but no mode switching across chunks

English

563

69.2K

mattt@maedmatt·16 Oca

@elliotarledge But you don’t have access to the source code don’t you? How can you optimise it?

English

Elliot Arledge@elliotarledge·16 Oca

@maedmatt yeah game it running smoothly at 220 fps

English

216

Elliot Arledge@elliotarledge·16 Oca

timelapse #136 (16 hrs) - getting more into polymarket/manifold - finalizing up cuda book with manning - optimizing battery materials discovery kernels (to learn about material discovery) - got voice typing basically instant on my 3090 - burned $1200 in openrouter creds across my evals and opus 4.5 api spend lol - got opus to optimize arc raiders performance on my linux rig

English

153

50.7K

Keşfet

@tom_doerr @observie @asimovinc @ChongZitaZhang @kazeoto_code @NepYope @KyleMorgenstein @alxfazio