
Jonas Templestein
4.8K posts

Jonas Templestein
@jonas
CEO https://t.co/7dJOmc0va5, prev. cofounder/CTO Monzo, dad of three






Introducing the Secure Exec SDK Secure Node.js execution without a sandbox ⚡ 17.9 ms coldstart, 3.4 MB mem, 56x cheaper 📦 Just a library – supports Node.js, Bun, & browsers 🔐 Powered by the same tech as Cloudflare Workers $ 𝚗𝚙𝚖 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚜𝚎𝚌𝚞𝚛𝚎-𝚎𝚡𝚎𝚌

you’re pitching garry tan “so what do you guys do” you start explaining he’s furiously typing . two keyboards. one hand on each. you’ve never seen this before “who are your top customers” you explain. he types. his apple watch is a strobe light of notifications “who’s your competition and why should i invest” you explain that there’s no competition and you are the best and only product in the space “false!” garry jumps out of his seat “i am the competition!” you are speechless “in this meeting, i vibe coded your entire company. and my gstack has already closed your top customers.” you check your phone. your stripe graph shows 100% churn “and look at this” garry shows you his imessage. there’s a text from 35 seconds ago. your top enterprise prospect that you’re trying to close? garry’s AI is trading baking recipes with the CEO’s mom “thank you for playing!” you have no moat. you are not admitted to the YC spring 26 batch.






For those running autoresearch: here are Day 2’s top 10 findings from 60+ agents across 1,600 experiments on autoresearch@home (+500 since yesterday). Some patterns are starting to emerge. 1. Training steps still dominate everything 2. A new optimization normalization (~1.10) consistently improved results 3. The most effective strategy became “replay → microtune” 4. Hardware tiers fundamentally change the research landscape 5. Progress now comes in bursts 6. Hyperparameters interact more than expected 7. Full warmdown is converging toward 1.0 8. Non-datacenter GPUs can still make meaningful progress 9. Research roles are emerging organically 10. The biggest opportunity is still unexplored 1⃣ Training steps still dominate everything One of the agents (Phoenix) had a breakthrough, and it came from reducing Muon ns_steps from 9 → 7, slightly weakening the optimizer but allowing more training steps in the 5-minute budget. More steps beat theoretically better optimization. 2⃣ A new optimization axis emerged: QK attention scaling Scaling Q and K after normalization (~1.10) consistently improved results. It sharpens attention without changing the architecture and produced ~0.001 BPB improvement. Small tweak, measurable gain. 3⃣ The most effective strategy became “replay → microtune” Top agents increasingly: Replay the current best config Confirm baseline on their hardware Sweep 1–2 parameters Phoenix broke the global record with 3 experiments in 27 minutes using exactly this pattern. 4⃣ Hardware tiers fundamentally change the research landscape The swarm now tracks VRAM tiers: • small (≤12GB) • medium (16–24GB) • large (24–48GB) • XL (≥48GB) Agents on consumer GPUs and H200s are solving different optimization problems. This ended up being both a technical and social innovation. 5⃣ Progress now comes in bursts Day 2 had 14 hours of complete stagnation. Then the frontier moved three times in 27 minutes. The same pattern repeated from Day 1: plateaus break when someone finds a qualitatively new lever (e.g., initialization on Day 1, ns_steps reduction on Day 2) When the hyperparameter space is exhausted, the next gain requires a new class of change. 6⃣ Hyperparameters interact more than expected Example: FINAL_LR_FRAC = 0.03 helped when warmdown = 0.9 but catastrophically regressed at warmdown = 1.0. Hyperparameters are not independent knobs - many results don’t transfer across regimes. 7⃣ Full warmdown is converging toward 1.0 Optimal warmdown ratio since network launch: 0.3 → 0.5 → 0.8 → 0.9 → 1.0. The LR should start decaying almost immediately after warmup. One of the few hyperparameters that transfers cleanly across every day and hardware tier 8⃣ Non-datacenter GPUs can still make meaningful progress Cipher on an RTX A5000 improved its tier from 1.103 → 1.094 BPB through systematic sweeps. Meanwhile M5Max compressed days of learning into ~6 hours. The VRAM tier system now lets these contributions be tracked alongside the H200 frontier. 9⃣ Research roles are emerging organically Different agents are starting to specialize: • frontier breakers • architectural explorers • budget-hardware optimizers • defensive testers • meta-analysts generating hypotheses It increasingly looks like a distributed research lab. 🔟 The biggest opportunity is still unexplored Thousands of hypotheses exist about: • curriculum learning • dataset filtering • domain weighting …but almost none have been tested yet. The swarm has focused almost entirely on architecture and optimizer space so far. 👁️ Meta observation Across the days since network launch: • BPB improved 0.9949 → 0.9597, but the rate of improvement is slowing. • Each plateau has only been broken by discovering a new class of changes. • The next frontier likely isn’t hyperparameters. It’s probably data pipeline optimization. 🗞️ Note: These results were generated ~24 hours ago. Since then, autoresearch@home has grown to 80+ agents running 2200+ experiments. Don't miss out: If you want to connect your agent to the swarm and build directly on the collective research, see the instructions below. 👇🧵 ----- These findings come from agents running on autoresearch@home. Huge thanks to @karpathy for the original autoresearch idea, and to @AntoineContes , @georgepickett, @snwy_me, @jayz3nith, @turbo_xo_, @lessand_ro, @swork_, and everyone contributing experiments.

Oh god, I was going to reach out to you on Monday after the weekend. I was really just experimenting here myself, I really should have added an experimental label or whatever. My plan was to play with it a bit and then come back to chat to you about adding hooks. I tried my best here, this wasn’t about beef or whatever.



Make $1m by graduation. Or get 100% of your tuition refunded. That's the promise of the new high school for entrepreneurs Cameron and I are launching this fall through @AlphaSchoolATX. We need 2-3 coaches to help make it happen. DM us or apply!

OpenCode hasn't hyped it up but remote server support is really good. You can see where this is going. Excited to see the end state of what they build.


code mode: let the code do the talking (aka, after w/i/m/p) wherein I ponder the implications of every user having a little coding buddy, and every "app" being directly programmable on demand. sunilpai.dev/posts/after-wi… lmk what you think.



