Jonas Templestein

4.9K posts

Jonas Templestein banner
Jonas Templestein

Jonas Templestein

@jonas

CEO https://t.co/7dJOmc0va5, prev. cofounder/CTO Monzo, dad of three

Katılım Ekim 2009
2.7K Takip Edilen8.3K Takipçiler
Sabitlenmiş Tweet
Jonas Templestein
Jonas Templestein@jonas·
2025 will be the year we see the first self-driving startups. Level 0: No AI People do everything. They come up with ideas, build products, and run operations. Many legacy businesses still work this way. Level 1: People use AI tools ⬅︎ we are here People might use ChatGPT to help write copy or Cursor to help write code. This is where most startups are today. Level 2: AI agents complete tasks based on human instructions People might ask AI agents to write software from a plain-English spec or tell it execute well-defined customer service processes. At this point entire departments (like support or QA) get largely replaced by AI. No startups I know of operate at this level yet—but if yours does, let me know. Level 3: AI agents propose changes to their own instructions They might propose new customer service processes and product changes in response to customer feedback. Humans would still approve each of those changes. Just a few people could run a large company this way. Level 4: AI agents autonomously change their instructions At this point startups become self-improving. Humans would only be involved as an escalation point or where required by the real world (e.g. to raise capital or to incorporate). At this point many startups would only have one human. Level 5: No humans AI agents decide which businesses to start, raise capital (through crypto tokens or other means), build and run them. No humans required. This would require major reforms in the legal and financial system.
English
19
21
216
75.4K
Mario Zechner
Mario Zechner@badlogicgames·
people of pi.dev. i'm removing all tools from pi witbout replacement. get creative.
max@maxjendrall

@YoniBraslaver @badlogicgames oh god, please don't switch out all read, write, edit, bash tools for code mode. Would be in the spirit of extensibility tho hahaha "pi has 1 tool. deal with it"

English
58
5
459
140.2K
Jonas Templestein
I just realised you can use capnweb to build a lightning fast poor-man's version of cloudflare tunnels 🤯 You just need to write a tiny durable object class that hosts a capnweb session, and then write a tiny client side utility that connects to it We use it to e2e test deployed workers. Our vitest test runner tunnels into the deployed worker and can then receive normal fetch(Request) -> Promise requests from the worker
English
5
2
38
3.7K
Rhys
Rhys@RhysSullivan·
@viemccoy i'll bite, what would you name executor.sh? mission is to enable all of your agents to call all of your tools. a place where your agents go to get work done, collaborate with you, sharable with your team. along with one off scripts also generative ui & workflows
English
5
0
13
4.3K
𝚟𝚒𝚎 ⟢
𝚟𝚒𝚎 ⟢@viemccoy·
GDP would double if all of you decided to let me name your startups. If you can't find the True Name of your endeavor, it may succeed, but only partway to what could have been.
English
36
3
228
14.5K
Jonas Templestein
I guess it doesn’t matter. If you have a training set of code tokens, then it’s easy to convert that a training set of escaped code But all else being equal I still prefer not to use tool calls at all because you don’t have to deal with the arcane rules of the tool calling APIs (e.g. ordering of input/output items in the OpenAI responses APIs)
English
1
0
0
13
Jonas Templestein
Am having pretty good success asking agents to "Respond with a single triple-backtick block of javascript code" No "tool calling" (in the LLM post-training sense) involved - this means the LLM doesn't have to produce javascript code escaped inside a json object - which seems like a good thing Is anyone else doing this? Has anyone done formal benchmarks to see whether the benefit of not having to write escaped code is outweighed by the post-training bias towards tool calling (which is really just outputting special marker tokens with json in between) Tool calling, just like the formalised assistant/user message framing, feels like it may be a bit of a local maximum. But we might never find out, because of the v large investment in the current post-training format
Jonas Templestein tweet media
English
1
1
1
501
Jonas Templestein
@gingerhendrixai How does claude do tool calling without escaping the tool arguments? Are you saying the raw output tokens are no longer json shaped? Would love to read more about it
English
1
0
0
59
Gareth Andrew
Gareth Andrew@gingerhendrixai·
Yes! Pure code-act. I spent a while getting this to work on dynamic workers when they were new. You used to see more of this e.g. oai's swelancer harness. I've concluded it's probably not worth the effort though. Function calling is already optimized for major providers (e.g. no double escaping in Claude, there's a hidden XML transformation in the middle) so no reason to believe back ticks are better.
English
1
0
2
51
Jonas Templestein
@mmkalmmkal There’s this joke about how the amount of leverage workers have in a tech company is inversely proportional to the size of their primary screen Devs have 5 screens, their managers use the MacBook screen and the execs use an iPhone
English
1
0
2
94
Jonas Templestein retweetledi
sam
sam@samgoodwin89·
Generating SDKs from APIs is better done by coding agents now than with tools like Stainless. In the real world, every spec is wrong, incomplete and inconsistent. Someone has to go and patch the spec before you can get good results with a rigid code generator. And Stainless APIs still don't give you the errors! They produce nice looking SDKs, but lack the most critical aspect of APIs - the "unhappy paths", which are usually far more in number than happy paths, and are what makes the difference between a great and a terrible UX. Stainless support many targets, but ask anyone who's used Cloudflare's terraform provider and you'll quickly realize that it's not magic. If the spec sucks, the provider sucks. And most specs suck. Distilled and Alchemy address this with AI. We use coding agents for 100%, so each new SDK we onboard is effectively "hand crafted". AI adapts "manually" to the nuances and weirdness of the spec and API. We share some code, but we don't try and squeeze specs into one code generator. Every time we make one, it becomes useful context for the next one and drives the flywheel. Since we are targeting Effect, we value errors more than the happy path. None of the APIs we've worked with except for AWS have documented their errors in the spec. And AWS still hasn't documented 100% (maybe 80-90% at best). AI patches these missing errors (and categorizes them as retryable, etc.) by interacting with the service and observing its actual behavior. This then feeds into Alchemy which uses AI to generate hand-crafted IaC resources and our Effect abstraction on top. This generation process reverse engineers the API's actual behavior and produces: 1) Effect-native SDKs for every cloud, and 2) IaC Resources for every cloud, 3) Alchemy Bindings for every cloud API.
Techmeme@Techmeme

Source: Anthropic is in advanced talks to acquire New York-based Stainless, which helps developers generate SDKs from APIs, for at least $300M (The Information) (Visit Techmeme dot com for the link and full context!)

English
8
5
104
9.3K
Jonas Templestein
@threepointone Why not just execute(“search(…)”)? And why tool calling (i.e. ask the LLM to produce code in json) and not just “respond with code”
English
1
0
1
283
sunil pai
sunil pai@threepointone·
starting to think now that every agent should have just 2 tools. search and execute. we _want_ agents to have access to 100s, if not 1000s of capabilities, that can contextually change during their lifetimes, even per message. saying stiff like "just use bash" doesn't encompass 3rd party apis, and you don't want to keep switching up the base prompt all the time. you gotta generalise that. I also guess search has to be semantic, so probably something with a vector db type thing. does it run on every message? probably...
sunil pai@threepointone

maybe every mcp server can be 3 tools - describe(filter) => schema: get all capabilities - search(input) => toolcalls[]: for a (maybe unstructured) language query (+ opt. metadata) get a sequence/tree of toolcalls - execute(tools) => result: take that list above and run it

English
44
11
247
44.6K
Jonas Templestein
Jonas Templestein@jonas·
@badlogicgames I built one with the kids. it’s v fun! the company is French I think so no problem to get one in Europe
English
0
0
1
182
Jonas Templestein
Jonas Templestein@jonas·
@badlogicgames But at least they put screws on every battery compartment now for safety now! So when the time comes to throw away the toy, it gets thrown away with batteries inside because parents can’t be bothered to unscrew four tiny screws
English
0
0
1
182
Mario Zechner
Mario Zechner@badlogicgames·
i swear they make these extra unrepairable. this would be a 2 minute soldering job if they didn't hide those damn screews in those narrow shafts. planned obsolesence in kids toys is the worst.
Mario Zechner tweet mediaMario Zechner tweet media
English
7
1
64
12.5K
Jonas Templestein retweetledi
Paul Graham
Paul Graham@paulg·
Sure you can earn a billion dollars. I've been teaching people how to do it for 20 years. The way you do it is to start a company that grows fast. You don't have to do anything bad to make a company grow fast. You just have to make something people want. paulgraham.com/ace.html
Marco Foster@MarcoFoster_

AOC: “There’s a certain level of wealth and accumulation that is unearned. You can’t earn a billion dollars. You just can’t earn that. You can get market power, you can break rules, you can abuse labor laws, you can pay people less than what they’re worth, but you can’t earn that”

English
555
762
11.3K
3.1M
Misha Kaletsky
Misha Kaletsky@mmkalmmkal·
And sorry, to clarify, you don't need to generate json schema. It IS json schema!
Misha Kaletsky tweet media
English
1
0
3
745
Misha Kaletsky
Misha Kaletsky@mmkalmmkal·
I think I've been sleeping on typebox. This is pretty magical. Plain typescript input, produces valid json-schema, and parses it. In one library that's smaller than zod, valibot or arktype. Feels very codemode-friendly...
Misha Kaletsky tweet media
English
12
4
110
11.6K
sunil pai
sunil pai@threepointone·
@jonas been using pen and paper to write down ideas like a fkin troglodyte more yapping when it becomes concrete!
English
1
0
1
160