Erik Vandeputte

1.3K posts

Erik Vandeputte

@erikvdp

Machine Learning, Software Engineering & Biology. Occasional piano player.

Ghent Katılım Mayıs 2009

1.7K Takip Edilen329 Takipçiler

Sabitlenmiş Tweet

Erik Vandeputte@erikvdp·19 Oca

It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. -- Sherlock Holmes, "A scandal in Bohemia"

English

Erik Vandeputte@erikvdp·20 May

In my opinion, the compounding effect of mastering one tool (like Claude Code) vastly outperforms chasing marginal model upgrades. A 5% "smarter" model cannot compete with a tool that understands your workflow.

English

Erik Vandeputte retweetledi

staysaasy@staysaasy·9 Nis

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

181

1.7K

2.2M

Erik Vandeputte retweetledi

Andrej Karpathy@karpathy·11 Mar

@nummanali tmux grids are awesome, but i feel a need to have a proper "agent command center" IDE for teams of them, which I could maximize per monitor. E.g. I want to see/hide toggle them, see if any are idle, pop open related tools (e.g. terminal), stats (usage), etc.

English

300

117

3.1K

1.4M

Erik Vandeputte@erikvdp·3 Mar

"Specs become the new source of truth"

Latent.Space@latentspacepod

🆕 How to Kill The Code Review latent.space/p/reviews-dead the volume and size of PRs is skyrocketing. @simonw called out StrongDM’s “Dark Factory” last month: no human code, but *also* no human review (!?) in this week’s guest post, @ankitxg makes a 5 step layered playbook for how this can come true.

English

Erik Vandeputte@erikvdp·15 Şub

@vincent_spruyt Obviously not a fair comparison between the 20$ Cursor sub and the 150$ Claude sub, but once you reach a certain threshold of token usage, Claude provides much better ROI

English

Erik Vandeputte@erikvdp·15 Şub

@vincent_spruyt My team switched to claude code because of this. It's been 3 weeks now and I'm only occasionally opening Cursor now to do some small edits.

English

Vincent Spruyt@vincent_spruyt·14 Şub

How do you continuously use Cursor without hitting 500 usd on demand per month on top of the 200/month ultra fee? Genuinely looking for a solution, this is too expensive

English

146

Erik Vandeputte@erikvdp·27 Oca

100%

Andrej Karpathy@karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

QST

Erik Vandeputte retweetledi

antirez@antirez·11 Oca

New blog post: Don't fall into the anti-AI hype. antirez.com/news/158

English

216

1.4K

313.3K

Erik Vandeputte@erikvdp·10 Ağu

@_prateekyadav Interested!

Ghent, Belgium 🇧🇪 English

Prateek@_prateekyadav·10 Ağu

Finally got the invite to try Comet. Have a few invites to give away now. DM/reply if you are looking for one.

English

266

Erik Vandeputte@erikvdp·10 Ağu

@iruletheworldmo Interested!

Ghent, Belgium 🇧🇪 English

🍓🍓🍓@iruletheworldmo·10 Ağu

i’ve still got a ton of codes for comet early access. if you comment below i’ll dm you with an access code.

English

280

122

22.1K

Erik Vandeputte retweetledi

alex duffy@alxai_·5 Haz

Finally launched AI Diplomacy TL;DR - Watch 18 different models from @AnthropicAI @OpenAI @GoogleAI @GoogleDeepMind @NousResearch @xai @deepseek_ai @AIatMeta etc. compete to try taking over Europe Their personalities on full display o3 is a schemer Claude a pacifist Gemini a strategist Deepseek a bit dramatic The repo is open source for anyone to try, stream's live (working out a few last kinks), & you can find our writeup 👇

Dan Shipper 📧@danshipper

🚨 NEW: We made Claude, Gemini, o3 battle each other for world domination. We taught them Diplomacy—the strategy game where winning requires alliances, negotiation, and betrayal. Here's what happened: DeepSeek turned warmongering tyrant. Claude couldn't lie—everyone exploited it ruthlessly. Gemini 2.5 Pro nearly conquered Europe with brilliant tactics. Then o3 orchestrated a secret coalition, backstabbed every ally, and won. Why did we do this? The most popular AI benchmarks don't test deception. But as these models get deployed everywhere—from your email to your workplace—we need to know: Will they lie to get what they want? So @every we built the ultimate test: AI Diplomacy, a dynamic benchmark that measures AI's ability to form alliances, negotiate, and betray each other. Watch them live below! Created from the ground up by @alxai_ and @Tyler_Marques.

English

144

34K

Erik Vandeputte@erikvdp·4 Haz

Went from wondering "how sensitive am I really to caffeine?" to genetic insights in under 10 mins. LLMs are a game-changer for personalized scientific discovery and for crafting those sometimes tricky bioinformatics commands ;) . Turns out I'm a slow metabolizer!

English

Erik Vandeputte retweetledi

Andrej Karpathy@karpathy·4 Haz

Good post from @balajis on the "verification gap". You could see it as there being two modes in creation. Borrowing GAN terminology: 1) generation and 2) discrimination. e.g. painting - you make a brush stroke (1) and then you look for a while to see if you improved the painting (2). these two stages are interspersed in pretty much all creative work. Second point. Discrimination can be computationally very hard. - images are by far the easiest. e.g. image generator teams can create giant grids of results to decide if one image is better than the other. thank you to the giant GPU in your brain built for processing images very fast. - text is much harder. it is skimmable, but you have to read, it is semantic, discrete and precise so you also have to reason (esp in e.g. code). - audio is maybe even harder still imo, because it force a time axis so it's not even skimmable. you're forced to spend serial compute and can't parallelize it at all. You could say that in coding LLMs have collapsed (1) to ~instant, but have done very little to address (2). A person still has to stare at the results and discriminate if they are good. This is my major criticism of LLM coding in that they casually spit out *way* too much code per query at arbitrary complexity, pretending there is no stage 2. Getting that much code is bad and scary. Instead, the LLM has to actively work with you to break down problems into little incremental steps, each more easily verifiable. It has to anticipate the computational work of (2) and reduce it as much as possible. It has to really care. This leads me to probably the biggest misunderstanding non-coders have about coding. They think that coding is about writing the code (1). It's not. It's about staring at the code (2). Loading it all into your working memory. Pacing back and forth. Thinking through all the edge cases. If you catch me at a random point while I'm "programming", I'm probably just staring at the screen and, if interrupted, really mad because it is so computationally strenuous. If we only get much faster 1, but we don't also reduce 2 (which is most of the time!), then clearly the overall speed of coding won't improve (see Amdahl's law).

Balaji@balajis

AI PROMPTING → AI VERIFYING AI prompting scales, because prompting is just typing. But AI verifying doesn’t scale, because verifying AI output involves much more than just typing. Sometimes you can verify by eye, which is why AI is great for frontend, images, and video. But for anything subtle, you need to read the code or text deeply — and that means knowing the topic well enough to correct the AI. Researchers are well aware of this, which is why there’s so much work on evals and hallucination. However, the concept of verification as the bottleneck for AI users is under-discussed. Yes, you can try formal verification, or critic models where one AI checks another, or other techniques. But to even be aware of the issue as a first class problem is half the battle. For users: AI verifying is as important as AI prompting.

English

134

537

4.4K

844.7K

Erik Vandeputte@erikvdp·29 Nis

A watch with a sweeping second hand. A subtle reminder that time doesn't tick. Live in the continuous present, not just the segmented moments. #VacationPhilosophy #Stoicism

De Panne, Belgium 🇧🇪 English

Erik Vandeputte retweetledi

Andrej Karpathy@karpathy·7 Nis

x.com/i/article/1909…

ZXX

205

791

5.9K

Erik Vandeputte retweetledi

Ben South@bnj·4 Mar

Vibe coding is all fun and games until you have to vibe debug

English

198

460

262.3K

Erik Vandeputte retweetledi

Andrej Karpathy@karpathy·28 Oca

"Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just slightly unnerving, emergent phenomenon only achievable by large-scale reinforcement learning. You can't get there by expert imitation. It's when AlphaGo played move 37 in Game 2 against Lee Sedol, a weird move that was estimated to only have 1 in 10,000 chance to be played by a human, but one that was creative and brilliant in retrospect, leading to a win in that game. We've seen Move 37 in a closed, game-like environment like Go, but with the latest crop of "thinking" LLM models (e.g. OpenAI-o1, DeepSeek-R1, Gemini 2.0 Flash Thinking), we are seeing the first very early glimmers of things like it in open world domains. The models discover, in the process of trying to solve many diverse math/code/etc. problems, strategies that resemble the internal monologue of humans, which are very hard (/impossible) to directly program into the models. I call these "cognitive strategies" - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc. Weird as it sounds, it's plausible that LLMs can discover better ways of thinking, of solving problems, of connecting ideas across disciplines, and do so in a way we will find surprising, puzzling, but creative and brilliant in retrospect. It could get plenty weirder too - it's plausible (even likely, if it's done well) that the optimization invents its own language that is inscrutable to us, but that is more efficient or effective at problem solving. The weirdness of reinforcement learning is in principle unbounded. I don't think we've seen equivalents of Move 37 yet. I don't know what it will look like. I think we're still quite early and that there is a lot of work ahead, both engineering and research. But the technology feels on track to find them. youtube.com/watch?v=HT-UZk…

YouTube

English

432

1.4K

9.5K

Erik Vandeputte retweetledi

François Chollet@fchollet·27 Oca

The key is really this: AI usefulness scales logarithmically with inference time compute. Right now for many use cases the amount of compute you need to operate at human-level is such that AI isn't economically viable for that use case. The more compute efficient AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need.

Satya Nadella@satyanadella

Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of. en.m.wikipedia.org/wiki/Jevons_pa…

English

278

2.5K

331.2K

Erik Vandeputte@erikvdp·27 Oca

"The Short Case for Nvidia Stock" youtubetranscriptoptimizer.com/blog/05_the_sh…

English

105

Erik Vandeputte retweetledi

Ethan Mollick@emollick·17 Ara

Given that Google has assembled all the pieces for a working AI assistant in the coming months with Gemini 2 Flash multimodal plus Mariner, I really wonder if Apple catches up or if AI is finally the Nokia moment for iPhones.

9to5Mac@9to5mac

Most iPhone owners see little to no value in Apple Intelligence so far 9to5mac.com/2024/12/16/mos… by @benlovejoy

English

118

1.8K

334.1K

Keşfet

@nummanali @vincent_spruyt @_prateekyadav @iruletheworldmo @AnthropicAI @OpenAI @GoogleAI @GoogleDeepMind