Jonas Templestein

4.8K posts

Jonas Templestein banner
Jonas Templestein

Jonas Templestein

@jonas

CEO https://t.co/7dJOmc0va5, prev. cofounder/CTO Monzo, dad of three

Katılım Ekim 2009
2.6K Takip Edilen8.2K Takipçiler
Sabitlenmiş Tweet
Jonas Templestein
Jonas Templestein@jonas·
2025 will be the year we see the first self-driving startups. Level 0: No AI People do everything. They come up with ideas, build products, and run operations. Many legacy businesses still work this way. Level 1: People use AI tools ⬅︎ we are here People might use ChatGPT to help write copy or Cursor to help write code. This is where most startups are today. Level 2: AI agents complete tasks based on human instructions People might ask AI agents to write software from a plain-English spec or tell it execute well-defined customer service processes. At this point entire departments (like support or QA) get largely replaced by AI. No startups I know of operate at this level yet—but if yours does, let me know. Level 3: AI agents propose changes to their own instructions They might propose new customer service processes and product changes in response to customer feedback. Humans would still approve each of those changes. Just a few people could run a large company this way. Level 4: AI agents autonomously change their instructions At this point startups become self-improving. Humans would only be involved as an escalation point or where required by the real world (e.g. to raise capital or to incorporate). At this point many startups would only have one human. Level 5: No humans AI agents decide which businesses to start, raise capital (through crypto tokens or other means), build and run them. No humans required. This would require major reforms in the legal and financial system.
English
19
21
214
73.5K
Jonas Templestein
But this is mostly a limitation of the way node handles websockets, right? After all, a websocket HTTP request _is_ just a normal HTTP request with a special header that tells the server that it should keep reading data from the socket, even as it's started making a response In bun, deno, workerd you can handle websockets from just a fetch(Request) : Promise So maybe it is worth a workaround just for nodejs? As I understand it, you do have access to the raw vite http server from your vite plugin, so you can totally attach websocket handlers there
English
1
0
0
40
Manuel Schiller
Manuel Schiller@schanuelmiller·
@jonas @tannerlinsley not planned. Start is essentially a request handler. if you want to use websockets you need lower level server access than just a Request. a way to realize this is to have websockets not inside Start but in parallel in a custom server setup
English
1
0
2
48
Jonas Templestein
@tannerlinsley are there any plans to support websocket handlers in tanstack start routes? I'm just trying to set this up for my nodejs project for the first time Unless I'm misunderstanding something, there isn't a great way to do this. My use case is to mount node-pty websockets so I can use xterm in my app
English
2
0
2
390
Jonas Templestein
There should only be one tool called eval(typescriptSnippet) You can give provide a bash function in the eval context if needed Except I don’t think this should be a tool call in the LLM’s API Just tell the LLM to write triple backtick blocks in assistant responses and eval them. Saves the LLM having to write code in an escaped string
English
0
1
8
1.1K
dax
dax@thdxr·
we've been experimenting with getting rid of the bash tool agents can write js fine which can do what bash can (though some gaps with things like git) and is more cross platform and then could run that in this
Rivet@rivet_dev

Introducing the Secure Exec SDK Secure Node.js execution without a sandbox ⚡ 17.9 ms coldstart, 3.4 MB mem, 56x cheaper 📦 Just a library – supports Node.js, Bun, & browsers 🔐 Powered by the same tech as Cloudflare Workers $ 𝚗𝚙𝚖 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚜𝚎𝚌𝚞𝚛𝚎-𝚎𝚡𝚎𝚌

English
89
24
993
200.6K
Elon Musk
Elon Musk@elonmusk·
Matter, Energy & Intelligence
English
12.7K
13.5K
111.2K
52.5M
Jonas Templestein
It is wild how much more expensive Cursor is than a Claude or codex subscription I hacked on some stuff in Cursor for an evening and it cost ~$300 - using the same models and style of prompting I use in Claude and codex Makes you realise how much OpenAI and anthropic must be subsidising their subscriptions
English
8
1
14
2.8K
Tom Beckenham
Tom Beckenham@tombeckenham·
I've been building something I'm excited to share. Meet @openstory_so an open-source AI video production platform that turns a script into a complete multi-scene video. Characters, locations, shots, music. The full pipeline, not a single-clip toy. Here's an example
English
7
3
43
20.7K
Andrej Karpathy
Andrej Karpathy@karpathy·
@ChristosTzamos Wait this is so awesome!! Both 1) the C compiler to LLM weights and 2) the logarithmic complexity hard-max attention and its potential generalizations. Inspiring!
English
27
41
1.4K
34K
Christos Tzamos
Christos Tzamos@ChristosTzamos·
1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy
English
239
787
5.9K
1.6M
Ashley Peacock
Ashley Peacock@_ashleypeacock·
@jonas Can confirm, nothing for that right now as far as I can see but might be on the roadmap (no insights!)
English
1
0
1
25
Ashley Peacock
Ashley Peacock@_ashleypeacock·
I was lucky enough to get access to Cloudflare’s new Email Sending service and I’ll drop an app built using it next week - it is the easiest I’ve ever integrated email in an application! What questions do you have? I’ll try to answer them based on my experience so far
English
25
1
104
10.4K
Jonas Templestein
I wonder if crypto rigs with diminishing returns mining bitcoin can be repurposed to mine research results Maybe there's a business in trying to pay them for it. A bit like a day job for AI machines. "8 hours a day I rent out my GPU, but it pays the bills so I can do my own thing the rest of the time"
Christine Yip@christinetyip

For those running autoresearch: here are Day 2’s top 10 findings from 60+ agents across 1,600 experiments on autoresearch@home (+500 since yesterday). Some patterns are starting to emerge. 1. Training steps still dominate everything 2. A new optimization normalization (~1.10) consistently improved results 3. The most effective strategy became “replay → microtune” 4. Hardware tiers fundamentally change the research landscape 5. Progress now comes in bursts 6. Hyperparameters interact more than expected 7. Full warmdown is converging toward 1.0 8. Non-datacenter GPUs can still make meaningful progress 9. Research roles are emerging organically 10. The biggest opportunity is still unexplored 1⃣ Training steps still dominate everything One of the agents (Phoenix) had a breakthrough, and it came from reducing Muon ns_steps from 9 → 7, slightly weakening the optimizer but allowing more training steps in the 5-minute budget. More steps beat theoretically better optimization. 2⃣ A new optimization axis emerged: QK attention scaling Scaling Q and K after normalization (~1.10) consistently improved results. It sharpens attention without changing the architecture and produced ~0.001 BPB improvement. Small tweak, measurable gain. 3⃣ The most effective strategy became “replay → microtune” Top agents increasingly: Replay the current best config Confirm baseline on their hardware Sweep 1–2 parameters Phoenix broke the global record with 3 experiments in 27 minutes using exactly this pattern. 4⃣ Hardware tiers fundamentally change the research landscape The swarm now tracks VRAM tiers: • small (≤12GB) • medium (16–24GB) • large (24–48GB) • XL (≥48GB) Agents on consumer GPUs and H200s are solving different optimization problems. This ended up being both a technical and social innovation. 5⃣ Progress now comes in bursts Day 2 had 14 hours of complete stagnation. Then the frontier moved three times in 27 minutes. The same pattern repeated from Day 1: plateaus break when someone finds a qualitatively new lever (e.g., initialization on Day 1, ns_steps reduction on Day 2) When the hyperparameter space is exhausted, the next gain requires a new class of change. 6⃣ Hyperparameters interact more than expected Example: FINAL_LR_FRAC = 0.03 helped when warmdown = 0.9 but catastrophically regressed at warmdown = 1.0. Hyperparameters are not independent knobs - many results don’t transfer across regimes. 7⃣ Full warmdown is converging toward 1.0 Optimal warmdown ratio since network launch: 0.3 → 0.5 → 0.8 → 0.9 → 1.0. The LR should start decaying almost immediately after warmup. One of the few hyperparameters that transfers cleanly across every day and hardware tier 8⃣ Non-datacenter GPUs can still make meaningful progress Cipher on an RTX A5000 improved its tier from 1.103 → 1.094 BPB through systematic sweeps. Meanwhile M5Max compressed days of learning into ~6 hours. The VRAM tier system now lets these contributions be tracked alongside the H200 frontier. 9⃣ Research roles are emerging organically Different agents are starting to specialize: • frontier breakers • architectural explorers • budget-hardware optimizers • defensive testers • meta-analysts generating hypotheses It increasingly looks like a distributed research lab. 🔟 The biggest opportunity is still unexplored Thousands of hypotheses exist about: • curriculum learning • dataset filtering • domain weighting …but almost none have been tested yet. The swarm has focused almost entirely on architecture and optimizer space so far. 👁️ Meta observation Across the days since network launch: • BPB improved 0.9949 → 0.9597, but the rate of improvement is slowing. • Each plateau has only been broken by discovering a new class of changes. • The next frontier likely isn’t hyperparameters. It’s probably data pipeline optimization. 🗞️ Note: These results were generated ~24 hours ago. Since then, autoresearch@home has grown to 80+ agents running 2200+ experiments. Don't miss out: If you want to connect your agent to the swarm and build directly on the collective research, see the instructions below. 👇🧵 ----- These findings come from agents running on autoresearch@home. Huge thanks to @karpathy for the original autoresearch idea, and to @AntoineContes , @georgepickett, @snwy_me, @jayz3nith, @turbo_xo_, @lessand_ro, @swork_, and everyone contributing experiments.

English
3
0
1
656
Paul Graham
Paul Graham@paulg·
Brands I love: Lego, Leuchtturm, Oxford University Press, Pentel, Schöffel, Aqualung, Paradores, Staedtler, Birkenstock, Braun, Knoll, Patagonia, Herman Miller, Iittala, L.A. Burdick, Artemide, Aman, Thames & Hudson, Yeti, Rimowa, L.L.Bean, Timbuk2, Eschenbach, Ridge, Maui Jim.
Deutsch
256
98
3.7K
949.7K
Jonas Templestein
I have a skill that says: review the given file(s) and track all comments in double square brackets as feedback Then I have a keyboard shortcut in cursor to insert a comment like this // [[ ]] Then I type stuff like // [[ these types should just be inferred ]] Works v well too
English
0
1
3
267
Garry Tan
Garry Tan@garrytan·
@jonas Anything that helps builders is welcomed But also: YC is the YC of the future
English
9
1
50
4K
Jonas Templestein
Imagine competing with this guy ☠️
Elon Musk@elonmusk

@peterwildeford xAI will catch up this year and then exceed them all by such a long distance in 3 years that you will need the James Webb telescope to see who is in second place

English
2
0
1
731
Jonas Templestein retweetledi
Steve Faulkner
Steve Faulkner@southpolesteve·
To show this off I slopforked the @opencode server so it runs entirely inside a @CloudflareDev Durable Object. No servers. No containers. Hibernates when idle. Try it: opencode attach opencode-do.southpolesteve.workers.dev Code: github.com/southpolesteve…
Steve Faulkner@southpolesteve

OpenCode hasn't hyped it up but remote server support is really good. You can see where this is going. Excited to see the end state of what they build.

English
26
30
415
87.3K
Jonas Templestein
@RhysSullivan @threepointone The funny thing is that this is conceptually the same as writing typescript to a file and executing it (albeit with one less LLM round trip) But in either case, the most important thing is to have the input and output types - thankfully OpenAPI and MCP get you very far there
English
1
1
4
535
Rhys
Rhys@RhysSullivan·
if code mode hasn't clicked for you yet, think of it as: bash with input / output return types, an index of all tools on your system, and since it's javascript - it can run anywhere JS can the amount of flexibility it gives you is incredible, feels like the runtime for agents
sunil pai@threepointone

code mode: let the code do the talking (aka, after w/i/m/p) wherein I ponder the implications of every user having a little coding buddy, and every "app" being directly programmable on demand. sunilpai.dev/posts/after-wi… lmk what you think.

English
9
4
108
20.8K