Andrej Karpathy

10K posts

Andrej Karpathy

@karpathy

I like to train large deep neural nets. Previously Director of AI @ Tesla, founding team @ OpenAI, PhD @ Stanford.

Stanford 가입일 Nisan 2009

1.1K 팔로잉2M 팔로워

고정된 트윗

Andrej Karpathy@karpathy·24 Oca

The hottest new programming language is English

English

1.8K

7.7K

60.4K

10.7M

Andrej Karpathy@karpathy·1d

@shikhr_ Yeah I have 4 blog posts that I didn’t finish yet this is one of them. Dobby runs my entire house over WhatsApp. Lights, shades, pool/spa, sonos, security HVAC etc

English

219

17.2K

Shikhar@shikhr_·1d

@karpathy Dobby the House Elf claw??? What did I miss!

English

15.7K

Andrej Karpathy@karpathy·1d

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

483

768

17.5K

856.2K

Andrej Karpathy@karpathy·1d

Ugh X breaks time links, it’s at 26:17

English

286

57.2K

Andrej Karpathy@karpathy·1d

(link to blast from the past) youtu.be/xQhb3C2hQoE?si…

YouTube

English

569

134.4K

Andrej Karpathy@karpathy·3d

@Yulun_Du @ilyasut SGD is a ResNet too (the blocks of it are fwd+bwd), the residual stream is the weights so... 🤔 We're not taking the Attention is All You Need part literally enough? :D

English

577

90.6K

Yulun Du@Yulun_Du·3d

@ilyasut once said that an LSTM is a ResNet rotated 90 degrees. :) It turns out attention can be rotated 90 degrees too — yielding a natural generalization of residual connections. 🥳

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

516

56.8K

Andrej Karpathy@karpathy·3d

@ChristosTzamos Wait this is so awesome!! Both 1) the C compiler to LLM weights and 2) the logarithmic complexity hard-max attention and its potential generalizations. Inspiring!

English

1.3K

33.9K

Christos Tzamos@ChristosTzamos·12 Mar

1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy

English

239

786

5.9K

1.6M

Andrej Karpathy@karpathy·3d

@rasbt @teortaxesTex Ty, I used your blog post, exported with obsidian ext into markdown, used it to enqueue ideas into my autoresearch loop

English

109

6.1K

Sebastian Raschka@rasbt·3d

@karpathy @teortaxesTex I can offer you the metadata in YAML format: github.com/rasbt/llm-arch… Happy tinkering!

English

469

21.2K

Sebastian Raschka@rasbt·4d

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

197

1.4K

8.1K

689K

Andrej Karpathy@karpathy·4d

@_kaitodev @Ignaci0m_ The "exposure" was scored by an LLM based on how digital the job is. This has no baring on what actually happens to these occupations, which has to do with demand elasticity and a lot more. People are sensationalizing the visualization tool and putting words in my mouth.

English

144

13K

Kaito | 海斗@_kaitodev·4d

thank you 🙏 the people’s reaction was overblown and completely justified at the same time honestly the data itself is still the most interesting part to me. even if the scoring methodology is somehow rough, the pattern it reveals ( that exposure tracks almost perfectly with “can you do this from a laptop” ) is a real insight worth checking out ppl weren’t panicking about the data but more about what the data actually confirmed.

English

4.7K

Kaito | 海斗@_kaitodev·5d

5 minutes ago, @karpathy just dropped karpathy/jobs! he scraped every job in the US economy (342 occupations from BLS), scored each one's AI exposure 0-10 using an LLM, and visualized it as a treemap. if your whole job happens on a screen you're cooked. average score across all jobs is 5.3/10. software devs: 8-9. roofers: 0-1. medical transcriptionists: 10/10 💀 karpathy.ai/jobs

English

967

1.8K

12.1K

3.5M

Andrej Karpathy@karpathy·4d

This was a saturday morning 2 hour vibe coded project inspired by a book I’m reading. I thought the code/data might be helpful to others to explore the BLS dataset visually, or color it in different ways or with different prompts or add their own visualizations. It’s been wildly misinterpreted (which I should have anticipated even despite the readme docs) so I took it down.

English

131

11.9K

Ignacio Montenegro@Ignaci0m_·4d

@_kaitodev @karpathy I can't find it

English

Andrej Karpathy@karpathy·4d

@Zhikai273 Wow. I was sure this was AI. (I mean generative AI.)

English

866

75.1K

Zhikai Zhang@Zhikai273·4d

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralR…

English

162

640

4.1K

1.3M

Andrej Karpathy@karpathy·5d

@vivek_2332 Yep, exactly and agree! Any process with a lot of knobs and objective criteria benefits a lot.

English

433

38.7K

Vivek@vivek_2332·5d

introducing autoresearch-rl, autonomous research for rl post-training. inspired by @karpathy autoresearch, and i think rl post-training is honestly one of the places where this idea fits perfectly. there are at least 50+ hyper parameters to tweak, learning rate, batch size, rollouts, clipping ratios, kl penalties, schedulers, the list goes on. instead of sitting there for hours turning knobs one at a time, just let the model figure out the right starting config on its own. some things worth mentioning: -> built on @PrimeIntellect prime-rl (my favourite rl post-training framework) and @willccbb verifiers for reward verification. -> ran qwen2.5-0.5b-instruct on gsm8k across 60+ autonomous experiments. eval score went from 0.475 to 0.550 and the agent actually found a way to do it in fewer steps (20 instead of 30). less compute, better results -> the whole thing was surprisingly smooth to set up and run. point the agent at the config, go to sleep, wake up to a full experiment log. i really wish i could try this on a bigger model but gpu poor for now lol -> the agent discovers things you wouldn't think to try. like how rollouts = 4 beats rollouts = 8, or how a constant lr schedule outperforms cosine. it just methodically tests everything i think the real value here is that rl training is so fragile and noisy that having an agent patiently run experiment after experiment is genuinely more effective than a human doing it manually. check it out: github.com/vivekvkashyap/…

English

749

78.3K

Andrej Karpathy@karpathy·11 Mar

My autoresearch labs got wiped out in the oauth outage. Have to think through failovers. Intelligence brownouts will be interesting - the planet losing IQ points when frontier AI stutters.

English

532

300

566K

Andrej Karpathy@karpathy·11 Mar

Human orgs are not legible, the CEO can’t see/feel/zoom in on any activity in their company, with real time stats etc. I have no doubt that it will be possible to control orgs on mobile, with voice etc., but with this level of legibility will that be optimal? Not in principle and asymptotically but in practice and for at least the next round of play.

English

1.2K

204.2K

Andrej Karpathy@karpathy·11 Mar

All of these patterns as an example are just matters of “org code”. The IDE helps you build, run, manage them. You can’t fork classical orgs (eg Microsoft) but you’ll be able to fork agentic orgs.

English

166

241

3.5K

400.9K

Andrej Karpathy@karpathy·11 Mar

Expectation: the age of the IDE is over Reality: we’re going to need a bigger IDE (imo). It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.

Andrej Karpathy@karpathy

@nummanali tmux grids are awesome, but i feel a need to have a proper "agent command center" IDE for teams of them, which I could maximize per monitor. E.g. I want to see/hide toggle them, see if any are idle, pop open related tools (e.g. terminal), stats (usage), etc.

English

791

832

10.5K

2.3M

Andrej Karpathy@karpathy·11 Mar

@amit05prakash I wanted to buy a bigger monitor and discovered that others had the same idea

English

180

36.1K

Amit Prakash@amit05prakash·11 Mar

@karpathy I can finally justify my reasons for buying more monitors now

English

37.9K

Andrej Karpathy@karpathy·11 Mar

@trongthangpham @maxbittker ralph loop runs headless. i dislike headless sessions. i need to see and supervise agent work, possibly ask /btw questions of them, possibly pitch in ideas to the mix, etc etc.

English

200

10.3K

Trong-Thang Pham@trongthangpham·11 Mar

@karpathy @maxbittker I thought your version is similar to the ralph loop (the bash one) so it would loop forever. Is that not the case here?

English

max@maxbittker·10 Mar

From @karpathy's autoresearch .md

English

124

3.1K

219.5K

Andrej Karpathy@karpathy·11 Mar

@nvbkdw @nummanali yes, solid work trending in a good direction, but almost all my work is across like 20 different machines (my local, my claw machine, my gpu machines). possibly they could add ssh mode, a bit like VS Code does (for the same reasons).

English

110

15.3K

Ryan Huang@nvbkdw·11 Mar

@karpathy @nummanali Codex desktop is pretty good

English

14.2K

Numman Ali@nummanali·11 Mar

Claude Code teams with tmux is really cool When you run with team mode enabled in tmux, it automatically opens the additional terminal in pane I don't really get my main agent to orchestrate, I chat to them myself CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=true claude

English

1.4K

186.7K

탐색

@shikhr_ @Yulun_Du @ilyasut @ChristosTzamos @rasbt @teortaxesTex @_kaitodev @Ignaci0m_