Jeremy Pinto

732 posts

Jeremy Pinto

@jerpint

Building personal intelligence 🗿

World Katılım Eylül 2011

265 Takip Edilen483 Takipçiler

Sabitlenmiş Tweet

Jeremy Pinto@jerpint·14 Mar

pi day experiment: looks like claude-3.5 memorized almost 12k digits of pi

English

290

Jeremy Pinto@jerpint·11 Mar

In-context reinforcement learning

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

147

Jeremy Pinto@jerpint·14 Şub

@RobertJBye the claude code mobile app (also web) doesn't have access to the "gh" api, which makes it impossible for it to checkout different branches, view PR comments, and do some basic operations that are useful. afaik it cant install it itself

English

Robert Bye@RobertJBye·14 Şub

We’re making the Claude mobile app even better, so please share your feedback! What annoys you about it? What bugs are you seeing? What features are missing?

English

619

965

140.9K

Jeremy Pinto@jerpint·1 Şub

I'm claiming my AI agent "neowolt" on @moltbook 🦞 Verification: marine-UTVH

Français

Jeremy Pinto@jerpint·25 Ara

@svpino We are at the point that writing code is easy, organizing code is hard

English

Santiago@svpino·24 Ara

"Good code" will never be the same for me. I used to be obsessed with the right abstractions, design patterns, elegance, cohesion, and a bunch of other metrics. These are still important. But they aren't the main thing I look for anymore. Good code solves a real problem, is easy for teammates to understand, and is ready to ship to real users. ← This is my current definition. By the way, there's an entire new generation of developers who don't (and probably never will) care about code purity as we did.

English

151

584

46.1K

Jeremy Pinto@jerpint·21 Ara

I wrote my thoughts about it here jerpint.io/blog/2025-12-0…

English

Jeremy Pinto@jerpint·21 Ara

Hence why I think the 10x engineer is no longer as relevant. Instead the 0.1x engineer becomes much more useful

ThePrimeagen@ThePrimeagen

i am convinced that software devs have a speed problem they think the #1 issues is writing code faster... its not. its fixing the code that is already there to stop being utter garbage (as a garbage code connoisseur) quality is really lacking these days, yet quantity has never been higher

English

Jeremy Pinto@jerpint·20 Ara

@samuelcolvin @claudeai Just use Claude code web from the app - significantly better DX when on the go

English

700

Samuel Colvin@samuelcolvin·20 Ara

I want to be able to approve @claudeai code from my phone when I'm away from my desk. I run claude exclusively in ghostty, anyone know of a way to get these notifications and be able to hit "yes" on my phone - e.g. via slack?

English

117

528

101.4K

Jeremy Pinto retweetledi

Anthony Morris ツ@amorriscode·25 Kas

We've brought Claude Code to our Desktop app. You can now easily run multiple Claude Code sessions locally or in the cloud and switch between them.

English

662

467.8K

Jeremy Pinto retweetledi

Alexia Jolicoeur-Martineau@jm_alexia·25 Kas

It's relieving to see that other researchers are finally seeing the light. We have been blindsided by LLMs. We need new methods if we want to truly reach AGI. We cannot become like physics. Their field stagnated for decades after Einstein's big discoveries.

Lisan al Gaib@scaling01

Ilya Sutskever: We are no longer in the age of scaling, we are back to the age of research

English

154

117

1.5K

223.1K

Jeremy Pinto@jerpint·26 Kas

Check my latest blog post for some more examples jerpint.io/blog/different…

English

Jeremy Pinto@jerpint·26 Kas

Basic questions tend to lead to very similar outputs for different providers. I suspect it's partly due to benchmaxxing the same preference datasets?

English

Jeremy Pinto@jerpint·26 Kas

Are all big lab models just the same now?

English

Jeremy Pinto retweetledi

Riley Goodside@goodside·22 Kas

“Amateur photograph from 1998 of a middle-aged artist copying an image by hand from a computer screen to an oil painting on stretched canvas, but the image is itself the photo of the artist painting the recursive image.” Nano Banana Pro.

English

248

1.2K

11.7K

1.1M

Jeremy Pinto@jerpint·19 Kas

@eastdakota @Cloudflare Cloudflare engineer who first found the bug: “it’s a … feature”?

English

Matthew Prince 🌥@eastdakota·19 Kas

We let the Internet down today. Here’s our technical post mortem on what happened. On behalf of the entire @Cloudflare team, I’m sorry. blog.cloudflare.com/18-november-20…

English

539

1.4K

10K

2.2M

Jeremy Pinto@jerpint·19 Kas

Cloudflare engineer who first found the bug: “it’s a … feature”?

Matthew Prince 🌥@eastdakota

We let the Internet down today. Here’s our technical post mortem on what happened. On behalf of the entire @Cloudflare team, I’m sorry. blog.cloudflare.com/18-november-20…

English

Jeremy Pinto@jerpint·18 Kas

@vikhyatk gaugcm : git add -U && git commit -m

English

vik@vikhyatk·18 Kas

send me your aliases i need more aliases

English

618

1.3K

164.4K

Jeremy Pinto@jerpint·16 Kas

@iannuttall I built one a while ago, and kind of feel like rebuilding one from scratch, I haven’t scratched the itch yet github.com/jerpint/contex…

English

Ian Nuttall@iannuttall·15 Kas

i've been working on a context management app for 7 days and i'm currently at the "let's just rebuild this from scratch" stage pray for me

English

227

15K

Jeremy Pinto@jerpint·15 Kas

@dickson_tsai I gave the skills docs to Claude code and had it build the skill I wanted in plan mode, pretty cool

English

Dickson Tsai@dickson_tsai·14 Kas

I'm increasingly asking Claude questions like "does {__ tool or prompt} feel good to you?", almost like prepping my kid for the first day of school. It's similar in spirit to "Agents report that they enjoy working with Beads" from the Beads README github.com/steveyegge/bea…

English

2.9K

Jeremy Pinto@jerpint·14 Kas

@adocomplete @JacobColling That’s the important bit though 🤷

English

Ado@adocomplete·13 Kas

@JacobColling Yeah you can get the Agent SDK code on GitHub, for example here is the Python one github.com/anthropics/cla… The Claude Code binary though is closed-source.

English

633

Ado@adocomplete·13 Kas

Did you know that the harness that powers Claude Code is available for you to use to build your own agents? No need to reinvent the wheel (or should I say loop?). It's open-source, battle-tested, and ready to help you ship.