

Jeremy 🏰
1.6K posts

@SquidCorp_ink
🦊 Build quietly → 🪓 Sharpen the edges → 🦁 Ship loudly








we're on an Open Model mission to help builders create world class agents >20x cheaper than what they have today a couple things have become evident recently: 1. The age of the token subsidy is being pulled back 2. Open Models have crossed an intelligence threshold making them viable for real world agents at a fraction of the cost As teams get exponentially larger monthly bills from the labs, it's worth exploring how many agents today perform just as well using Open Models Check out the numbers on external evals + try it yourself by dogfooding and running on internal evals - @OpenRouter and @ArtificialAnlys have great leaderboards and breakdowns of what people are using. The time investment is definitely worth the massive cost savings - Instead of Sonnet 4.6 (or even 5.5/Opus) try Kimi-2.6, GLM5.1, Deepseek v4 pro, etc - Instead of Haiku try DeepSeekv4 Flash, Nemotron, etc Open models require some tuning to make sure they work well in your harness for your task (another reason why open harnesses are important) The closed models are excellent, there's no need to full-scale rip them out. Often the first use of Open Models is as subagents or using a closed frontier model as an Advisor to an open driver model At LangChain we want to make it as easy as possible to build the best agents in the world as cheaply and quickly as possible. We're leaning into open models heavily across our products and libraries try out an open model in deepagents in just a couple lines and come ride the open model, open harness future






the future of software engineering seems uncontroversially prompting + code review. startups will skip the code review because they’re racing against time. larger/serious orgs will take code review very seriously. llms can do code review, but my guess is that because they have to search through large space, it will be as expensive to have say mythos review your code as it would be to have a senior dev. based on budget: $: prompting only $$: low grade llm review $$$: mid grade llm + dev review $$$$: high grade llm + sr dev review btw, software (past the bootstrapping phase) will get more expensive to make and take more time. quality will remain exactly the same as when humans were doing it: shit.





If you don't understand this, you will not understand why LLM-based agents are irreparably failing for a general-purpose problem solving. An agent (by the way it was the topic of my PhD 20 years ago) to be useful, must be rational. Being rational means to always prefer an outcome that results in the maximal expected utility to its master/user. Let’s say an agent has two actions they can execute in an environment: a_1 and a_2. If the agent can predict that a_1 gives its user an expected utility of 10, and a_2 gives an expected utility of -100, then a rational agent must choose a_1 even if choosing a_2 seems like a better option when explained in words. The numbers 10 and -100 can be obtained by summing the products of all possible outcomes for each action and their likelihoods. Now here is the problem with LLM-based agents. The LLM is not optimizing expected utility in the environment. It is optimizing the next token, conditioned on a prompt, a context window, and a training distribution full of examples of what helpful answers are supposed to look like. Those are not the same objective. So when we wrap an LLM in a loop and call it an “agent,” we have not created a rational decision-maker. We have created a text generator that can imitate the surface form of deliberation. It may say things like: “I should compare the expected outcomes.” “The best action is probably a_1.” “I will now execute the optimal plan.” But the internal mechanism is not selecting actions by maximizing the user’s expected utility. It is generating a continuation that is statistically appropriate given the prompt and prior context. This distinction matters enormously. For narrow tasks, the imitation can be good enough. If the environment is constrained, the actions are simple, and the success criteria are close to patterns seen in training, the system can appear agentic. But for general-purpose problem solving, the gap becomes fatal. A rational agent needs stable preferences, calibrated beliefs, causal models of the world, the ability to evaluate consequences, and the discipline to choose the action with maximal expected utility even when that action is boring, non-linguistic, or unlike the examples in its training data. An LLM-based agent has none of that by default. It has fluency. It has pattern completion. It has a remarkable ability to compress and recombine human text. But fluency is not rationality, and a plausible plan is not an expected-utility calculation. This is why these systems so often fail in strange, brittle, and irreparable ways when given open-ended responsibility. They are not failing because the prompts are insufficiently clever. They are failing because we are asking a simulator of rational agency to be a rational agent.








spoke again about this yesterday at dinner. Most indie builders I know are europeans. Almost none of them have their company/residency setup in the EU @euacc



TUIs are not good sorry yall a CLI is a utility, and situational. this should not be confused with stuffing a full interactive GUI into a low capability platform. "lets ignore all the great UI technology of the last 20 years and build some caveman shit"