Jonathan Matus
2K posts

Jonathan Matus
@matusjon
Curious | Company builder | AI Tinkerer |


The token cost to build a production feature is now lower than the meeting cost to discuss building that feature. Let me rephrase. It is literally cheaper to build the thing and see if it works than to have a 30 minute planning meeting about whether you should build it. It’s wild when you think about it. This completely inverts how you should run a software organization. The planning layer becomes the bottleneck because the building layer is essentially free. The cost of code has dropped to essentially 0. The rational response is to eliminate planning for anything that can be tested empirically. Don’t debate whether a feature will work. Just build it in 2 hours, measure it with a group of customers, and then decide to kill or keep it. I saw a startup operating this way and their build velocity is up 20x. Decision quality is up because every decision is informed by a real prototype, not a slide deck and an expensive meeting. We went from “move fast and break things” to “move fast and build everything.” The planning industrial complex is dead. Thank god.




JUST IN: Elon Musk proclaims "we are in the Singularity"




Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

It’s extremely good that Anthropic has not backed down, and it’s siginficant that OpenAI has taken a similar stance. In the future, there will be much more challenging situations of this nature, and it will be critical for the relevant leaders to rise up to the occasion, for fierce competitors to put their differences aside. Good to see that happen today.







Something extraordinary may be about to happen in the realm of energy storage, thanks to a company called “Donut Lab” that’s pushing back hard against critics who claim the battery’s specifications are impossible. The company is about to release independent testing documentation (next week), and if the numbers support the claims of performance, this will be a new “Wright Brothers” moment for technological innovation, leaping far ahead of any other battery technology known to exist. Read my full analysis: The Donut Lab Battery: A Wright Brothers Moment for Energy Independence? - Letimäki's claims are staggeringly specific. The cited energy density of 400 Wh/kg would double the performance of the best commercial lithium-ion batteries and surpass even many experimental solid-state designs. For perspective, achieving this in an electric vehicle could mean ranges exceeding 1,000 miles on a single charge, rendering 'range anxiety' a relic of a bygone era. Even more revolutionary is the purported lifespan: 100,000 full charge-discharge cycles. Given that a typical EV might be cycled once per day, this translates to a potential operational life of 274 years—a durability so extreme it redefines 'durable goods' and could make the battery a permanent fixture in a vehicle or home, outlasting every other component. The implications of the materials claim are equally profound. Letimäki states the battery uses common, non-lithium, conflict-free materials. This directly challenges the fragile, geopolitically fraught supply chains built around lithium, cobalt, and nickel, which are often controlled by adversarial regimes or extracted under oppressive conditions. A shift to abundant, domestically sourceable materials would shatter the energy cartels and enable localized, resilient manufacturing. Here’s the full article: naturalnews.com/2026-02-21-the…

Before this, running parallel Claude Code agents required manual bash scripts, custom worktree management functions, and a dozen Medium tutorials explaining the setup. incident.io wrote an entire blog post about their homegrown tooling just to get multiple agents running without clobbering each other’s files. Developers were spending 30 minutes configuring worktree workflows before writing a single line of product code. Now it’s one flag. This tells you where the actual bottleneck in AI coding has been sitting. The models got smart enough to write production code months ago. The constraint was filesystem isolation. Two agents editing the same working directory creates race conditions, corrupted state, and merge nightmares that eat more time than the agents save. Faros AI found that teams with high AI adoption saw PR review time increase 91% because the overhead of managing parallel output overwhelmed the speed gains from generating it. The --worktree flag attacks that exact problem at the infrastructure layer. Each agent gets its own branch, its own directory, its own universe. No coordination overhead. No “git stash, git checkout, restart AI” loops that destroy context. What makes this interesting is what it does to the developer’s job description. The Pragmatic Engineer reported that senior engineers are becoming “naturals” at parallel agent workflows because the skillset maps directly to what they already do: managing multiple workstreams, reviewing code across branches, and delegating tasks. The role shifts from “person who writes code” to “person who orchestrates 5 agents writing code simultaneously and picks the best output.” Cursor already ships 8-agent parallelism. Codex has background agents. The entire AI coding market is converging on the same realization: single-threaded development is dead, and the tools that reduce friction for multi-agent orchestration win. One CLI flag. That’s the whole moat.




