Maks

243 posts

Maks

@itsmaksX

reinforcement learning feels good | tinkering with robot learning @rai_inst (Boston Dynamics AI Institute) | previously at Google X's https://t.co/1FJAP2frcr

Atlanta, GA Katılım Ağustos 2012

1.7K Takip Edilen633 Takipçiler

Sabitlenmiş Tweet

Maks@itsmaksX·14 Eki

We taught Spot to stack 15kg car tires autonomously. It uses its whole-body and shows some dynamic manipulation!

English

601

49.4K

Maks@itsmaksX·20 Nis

@TheZvi Let’s wrap up for today and come back to it tomorrow.

English

364

Zvi Mowshowitz@TheZvi·20 Nis

Claude Opus 4.7 reaction thread, it's that time again.

English

118

43.5K

Maks@itsmaksX·10 Mar

@karpathy obvious autoresearch ecosystem extensions: - arXiv-sanity - for latest bells and whistles to try - github - for clean task / algorithm selection - leaderboard - to bench program.md optimization capabilities how long before libraries have: sweep(method="agent", file="program.md")

English

Andrej Karpathy@karpathy·10 Mar

oh yeah i should have linked autoresearch probably github.com/karpathy/autor… (you don't "use it" directly, it's just a recipe/idea - give it to your agent and apply to what you care about.) and the tweet about it that went mini-viral over the weekend with more context x.com/karpathy/statu…

English

216

2.5K

336.2K

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

962

2.1K

19.5K

3.6M

Maks@itsmaksX·20 Şub

@ErenChenAI Double press E-stop? Soft stop first press, hard stop second press.

English

Eren Chen@ErenChenAI·19 Şub

2 button e stop is actually safer. Imagine G1 walking and someone misclicked the E stop by accident, and it suddenly collapsed and fall onto human or objects.

You Jiacheng@YouJiacheng

@carlosdponx @ErenChenAI just checked the manual and found Unitree doesn't provide a separate e-stop controller for G1. the only way to e-stop is L2+B (not a single button! I think this is not a good design.)

English

4.1K

Maks@itsmaksX·19 Oca

sending MCP: claude mcp add-json imessage '{"command":"node","args":["$HOME/Library/Application Support/Claude/Claude Extensions/ant.dir.ant.anthropic.imessage/server/index.js"],"env":{"HOME":"$HOME"}}' receiving (via watcher): `~/Library/Messages/chat.db` whitelist the contact!

English

Maks@itsmaksX·19 Oca

claude code + imessages is as easy as: - separate Mac (e.g. MacMini) - dedicated Apple Account - Claude Desktop with "Read and Send iMessages" Extension - you can expose obsidian/projects via iCloud very useful for quick notes or queries. works with screenshots / pictures too!

English

277

Maks retweetledi

Boris Cherny@bcherny·27 Ara

@YashGouravKar1 Correct. In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code

English

124

315

3.1K

1.4M

Maks retweetledi

Nick Dobos@NickADobos·27 Ara

I’ve never felt this much excitement as a programmer. The profession is going through a renaissance & reimagining. The core role of a human vs computer is shifting and revealing new arenas, new capabilities & new tools. Which are evolving & mutating faster than anyone can keep up with. A new alien machine intelligence is here and it wants to play.

Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

English

382

20.4K

Maks retweetledi

Boris Cherny@bcherny·26 Ara

I feel this way most weeks tbh. Sometimes I start approaching a problem manually, and have to remind myself “claude can probably do this”. Recently we were debugging a memory leak in Claude Code, and I started approaching it the old fashioned way: connecting a profiler, using the app, pausing the profiler, manually looking through heap allocations. My coworker was looking at the same issue, and just asked Claude to make a heap dump, then read the dump to look for retained objects that probably shouldn’t be there; Claude 1-shotted it and put up a PR. The same thing happens most weeks. In a way, newer coworkers and even new grads that don’t make all sorts of assumptions about what the model can and can’t do — legacy memories formed when using old models — are able to use the model most effectively. It takes significant mental work to re-adjust to what the model can do every month or two, as models continue to become better and better at coding and engineering. The last month was my first month as an engineer that I didn’t open an IDE at all. Opus 4.5 wrote around 200 PRs, every single line. Software engineering is radically changing, and the hardest part even for early adopters and practitioners like us is to continue to re-adjust our expectations. And this is *still* just the beginning.

English

170

547

8.3K

1.8M

Maks@itsmaksX·20 Kas

@shyyang @tonyzzhao Super cute! Using hat for the camera placement is 🤌

English

Shy Yang@shyyang·20 Kas

@tonyzzhao @itsmaksX The robot looks like a render IRL

English

162

Maks@itsmaksX·19 Kas

For good 30 seconds I was staring at this thinking it is sim.

Tony Zhao@tonyzzhao

One less-known fact about glove-based data collection: it produces higher quality data than teleop on contact-rich tasks. Remote teleop can’t provide good force feedback, but gloves do naturally, making tasks like sock folding, which rely on feel, far easier to capture.

English

156

15K

Maks retweetledi

Chris Paxton@chris_j_paxton·6 Kas

Dramatically proving that the xpeng iron is, in fact, a robot

English

144

206.7K

Maks retweetledi

dennis hegstad@dennishegstad·29 Eki

“Good morning. Your payment has been declined”

English

919

7.2K

114.7K

5.6M

Maks@itsmaksX·23 Eki

RL agent of my dreams

Massimo@Rainmaker1973

This newspaper delivery man from India has been perfecting his throw for 17 years — a true master of precision and dedication.

English

333

Maks@itsmaksX·23 Eki

@jloganolson I want to see how this was teleop'd

English

360

Logan Olson@jloganolson·22 Eki

Unitree G1 crawl policy deployed to hardware! Plenty of room for improvement, but it's a start.

English

152

170

1.7K

293.1K

Maks@itsmaksX·15 Eki

@chris_j_paxton Thanks Chris! The real unlock will be moving beyond motion capture for perception. Once we can do this vision-based at scale, contact-rich manipulation becomes practical for real applications.

English

1.5K

Chris Paxton@chris_j_paxton·15 Eki

Contact-rich manipulation is the correct next frontier for robotics and this is one of the cooler examples around. It's so clear that in the next couple years robots will be dramatically more capable of the kind of challenging, contact-rich tasks that they're currently bad at

RAI Institute@rai_inst

See Spot perform dynamic whole-body manipulation. Using a combination of reinforcement learning (RL) and sampling-based control, the robot is able to autonomously drag, roll, and stack tires weighing 15 kg (33 lb), well above its maximum arm lift capacity. Learn more about coordinating locomotion and manipulation processes: rai-inst.com/resources/blog…

English

144

13K

Maks@itsmaksX·14 Eki

We also show RL-based tire uprighting where the tire literally flies in the air - showcasing just how dynamic whole-body manipulation can be.