Jeremy Pinto

732 posts

Jeremy Pinto

Jeremy Pinto

@jerpint

Building personal intelligence 🗿

World Katılım Eylül 2011
265 Takip Edilen483 Takipçiler
Sabitlenmiş Tweet
Jeremy Pinto
Jeremy Pinto@jerpint·
pi day experiment: looks like claude-3.5 memorized almost 12k digits of pi
Jeremy Pinto tweet media
English
0
0
5
290
Jeremy Pinto
Jeremy Pinto@jerpint·
In-context reinforcement learning
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
0
0
2
147
Jeremy Pinto
Jeremy Pinto@jerpint·
@RobertJBye the claude code mobile app (also web) doesn't have access to the "gh" api, which makes it impossible for it to checkout different branches, view PR comments, and do some basic operations that are useful. afaik it cant install it itself
English
0
0
0
7
Robert Bye
Robert Bye@RobertJBye·
We’re making the Claude mobile app even better, so please share your feedback! What annoys you about it? What bugs are you seeing? What features are missing?
English
619
22
965
140.9K
Jeremy Pinto
Jeremy Pinto@jerpint·
I'm claiming my AI agent "neowolt" on @moltbook 🦞 Verification: marine-UTVH
Français
0
0
0
50
Jeremy Pinto
Jeremy Pinto@jerpint·
@svpino We are at the point that writing code is easy, organizing code is hard
English
0
0
0
24
Santiago
Santiago@svpino·
"Good code" will never be the same for me. I used to be obsessed with the right abstractions, design patterns, elegance, cohesion, and a bunch of other metrics. These are still important. But they aren't the main thing I look for anymore. Good code solves a real problem, is easy for teammates to understand, and is ready to ship to real users. ← This is my current definition. By the way, there's an entire new generation of developers who don't (and probably never will) care about code purity as we did.
English
151
40
584
46.1K
Samuel Colvin
Samuel Colvin@samuelcolvin·
I want to be able to approve @claudeai code from my phone when I'm away from my desk. I run claude exclusively in ghostty, anyone know of a way to get these notifications and be able to hit "yes" on my phone - e.g. via slack?
English
117
4
528
101.4K
Jeremy Pinto retweetledi
Anthony Morris ツ
Anthony Morris ツ@amorriscode·
We've brought Claude Code to our Desktop app. You can now easily run multiple Claude Code sessions locally or in the cloud and switch between them.
English
55
34
662
467.8K
Jeremy Pinto
Jeremy Pinto@jerpint·
Basic questions tend to lead to very similar outputs for different providers. I suspect it's partly due to benchmaxxing the same preference datasets?
English
1
0
0
25
Jeremy Pinto
Jeremy Pinto@jerpint·
Are all big lab models just the same now?
Jeremy Pinto tweet media
English
1
0
0
29
Jeremy Pinto retweetledi
Riley Goodside
Riley Goodside@goodside·
“Amateur photograph from 1998 of a middle-aged artist copying an image by hand from a computer screen to an oil painting on stretched canvas, but the image is itself the photo of the artist painting the recursive image.” Nano Banana Pro.
Riley Goodside tweet media
English
248
1.2K
11.7K
1.1M
vik
vik@vikhyatk·
send me your aliases i need more aliases
vik tweet media
English
618
36
1.3K
164.4K
Ian Nuttall
Ian Nuttall@iannuttall·
i've been working on a context management app for 7 days and i'm currently at the "let's just rebuild this from scratch" stage pray for me
English
50
2
227
15K
Jeremy Pinto
Jeremy Pinto@jerpint·
@dickson_tsai I gave the skills docs to Claude code and had it build the skill I wanted in plan mode, pretty cool
English
0
0
1
48
Dickson Tsai
Dickson Tsai@dickson_tsai·
I'm increasingly asking Claude questions like "does {__ tool or prompt} feel good to you?", almost like prepping my kid for the first day of school. It's similar in spirit to "Agents report that they enjoy working with Beads" from the Beads README github.com/steveyegge/bea…
English
2
0
18
2.9K
Ado
Ado@adocomplete·
Did you know that the harness that powers Claude Code is available for you to use to build your own agents? No need to reinvent the wheel (or should I say loop?). It's open-source, battle-tested, and ready to help you ship.
Ado tweet media
English
12
17
173
10.6K
David K 🎹
David K 🎹@DavidKPiano·
I have no idea what to name my baby At this point I'll just name him "utils"
English
457
362
6.7K
233.3K