Maks

243 posts

Maks banner
Maks

Maks

@itsmaksX

reinforcement learning feels good | tinkering with robot learning @rai_inst (Boston Dynamics AI Institute) | previously at Google X's https://t.co/1FJAP2frcr

Atlanta, GA Katılım Ağustos 2012
1.7K Takip Edilen633 Takipçiler
Sabitlenmiş Tweet
Maks
Maks@itsmaksX·
We taught Spot to stack 15kg car tires autonomously. It uses its whole-body and shows some dynamic manipulation!
English
37
76
601
49.4K
Maks
Maks@itsmaksX·
@TheZvi Let’s wrap up for today and come back to it tomorrow.
English
0
0
1
364
Zvi Mowshowitz
Zvi Mowshowitz@TheZvi·
Claude Opus 4.7 reaction thread, it's that time again.
English
71
1
118
43.5K
Maks
Maks@itsmaksX·
@karpathy obvious autoresearch ecosystem extensions: - arXiv-sanity - for latest bells and whistles to try - github - for clean task / algorithm selection - leaderboard - to bench program.md optimization capabilities how long before libraries have: sweep(method="agent", file="program.md")
Maks tweet media
English
0
2
5
1K
Andrej Karpathy
Andrej Karpathy@karpathy·
oh yeah i should have linked autoresearch probably github.com/karpathy/autor… (you don't "use it" directly, it's just a recipe/idea - give it to your agent and apply to what you care about.) and the tweet about it that went mini-viral over the weekend with more context x.com/karpathy/statu…
English
96
216
2.5K
336.2K
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
962
2.1K
19.5K
3.6M
Maks
Maks@itsmaksX·
@ErenChenAI Double press E-stop? Soft stop first press, hard stop second press.
English
0
0
0
29
Eren Chen
Eren Chen@ErenChenAI·
2 button e stop is actually safer. Imagine G1 walking and someone misclicked the E stop by accident, and it suddenly collapsed and fall onto human or objects.
You Jiacheng@YouJiacheng

@carlosdponx @ErenChenAI just checked the manual and found Unitree doesn't provide a separate e-stop controller for G1. the only way to e-stop is L2+B (not a single button! I think this is not a good design.)

English
4
0
8
4.1K
Maks
Maks@itsmaksX·
sending MCP: claude mcp add-json imessage '{"command":"node","args":["$HOME/Library/Application Support/Claude/Claude Extensions/ant.dir.ant.anthropic.imessage/server/index.js"],"env":{"HOME":"$HOME"}}' receiving (via watcher): `~/Library/Messages/chat.db` whitelist the contact!
English
0
0
0
97
Maks
Maks@itsmaksX·
claude code + imessages is as easy as: - separate Mac (e.g. MacMini) - dedicated Apple Account - Claude Desktop with "Read and Send iMessages" Extension - you can expose obsidian/projects via iCloud very useful for quick notes or queries. works with screenshots / pictures too!
Maks tweet media
English
1
0
0
277
Maks retweetledi
Boris Cherny
Boris Cherny@bcherny·
@YashGouravKar1 Correct. In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code
English
124
315
3.1K
1.4M
Maks retweetledi
Nick Dobos
Nick Dobos@NickADobos·
I’ve never felt this much excitement as a programmer. The profession is going through a renaissance & reimagining. The core role of a human vs computer is shifting and revealing new arenas, new capabilities & new tools. Which are evolving & mutating faster than anyone can keep up with. A new alien machine intelligence is here and it wants to play.
Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

English
14
17
382
20.4K
Maks retweetledi
Boris Cherny
Boris Cherny@bcherny·
I feel this way most weeks tbh. Sometimes I start approaching a problem manually, and have to remind myself “claude can probably do this”. Recently we were debugging a memory leak in Claude Code, and I started approaching it the old fashioned way: connecting a profiler, using the app, pausing the profiler, manually looking through heap allocations. My coworker was looking at the same issue, and just asked Claude to make a heap dump, then read the dump to look for retained objects that probably shouldn’t be there; Claude 1-shotted it and put up a PR. The same thing happens most weeks. In a way, newer coworkers and even new grads that don’t make all sorts of assumptions about what the model can and can’t do — legacy memories formed when using old models — are able to use the model most effectively. It takes significant mental work to re-adjust to what the model can do every month or two, as models continue to become better and better at coding and engineering. The last month was my first month as an engineer that I didn’t open an IDE at all. Opus 4.5 wrote around 200 PRs, every single line. Software engineering is radically changing, and the hardest part even for early adopters and practitioners like us is to continue to re-adjust our expectations. And this is *still* just the beginning.
English
170
547
8.3K
1.8M
Maks retweetledi
Chris Paxton
Chris Paxton@chris_j_paxton·
Dramatically proving that the xpeng iron is, in fact, a robot
English
144
75
1K
206.7K
Maks retweetledi
dennis hegstad
dennis hegstad@dennishegstad·
“Good morning. Your payment has been declined”
dennis hegstad tweet media
English
919
7.2K
114.7K
5.6M
Maks
Maks@itsmaksX·
@jloganolson I want to see how this was teleop'd
English
0
0
1
360
Logan Olson
Logan Olson@jloganolson·
Unitree G1 crawl policy deployed to hardware! Plenty of room for improvement, but it's a start.
English
152
170
1.7K
293.1K
Maks
Maks@itsmaksX·
@chris_j_paxton Thanks Chris! The real unlock will be moving beyond motion capture for perception. Once we can do this vision-based at scale, contact-rich manipulation becomes practical for real applications.
English
1
3
8
1.5K
Chris Paxton
Chris Paxton@chris_j_paxton·
Contact-rich manipulation is the correct next frontier for robotics and this is one of the cooler examples around. It's so clear that in the next couple years robots will be dramatically more capable of the kind of challenging, contact-rich tasks that they're currently bad at
RAI Institute@rai_inst

See Spot perform dynamic whole-body manipulation. Using a combination of reinforcement learning (RL) and sampling-based control, the robot is able to autonomously drag, roll, and stack tires weighing 15 kg (33 lb), well above its maximum arm lift capacity. Learn more about coordinating locomotion and manipulation processes: rai-inst.com/resources/blog…

English
4
15
144
13K
Maks
Maks@itsmaksX·
We also show RL-based tire uprighting where the tire literally flies in the air - showcasing just how dynamic whole-body manipulation can be.
English
1
2
15
836
Maks
Maks@itsmaksX·
We taught Spot to stack 15kg car tires autonomously. It uses its whole-body and shows some dynamic manipulation!
English
37
76
601
49.4K