Jeremy 🏰 (@SquidCorp_ink) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Jeremy 🏰@SquidCorp_ink·9 Nis

🚨 Don't let your Claude Code agent fall into the dumb zone. HireRufus.com monitors your Claude Code session in real time and warns you before the context becomes full.

English

0

1

238

Jeremy 🏰@SquidCorp_ink·4h

@Vtrivedy10 Great thanks for the link! I'll read it carefully 👀 I really like the idea of being tight to a vendor! Let's build it 🪓

English

0

1

11

Viv@Vtrivedy10·4h

@SquidCorp_ink we wrote up how we approach it here! tldr is being able to measure “awesome” with Evals is great, the loop is simple with that + human review is important also the harness design will matter based on the model choice so using Harness Profiles will prob help

Viv@Vtrivedy10

x.com/i/article/2041…

English

1

0

1

45

Viv@Vtrivedy10·18h

we're on an Open Model mission to help builders create world class agents >20x cheaper than what they have today a couple things have become evident recently: 1. The age of the token subsidy is being pulled back 2. Open Models have crossed an intelligence threshold making them viable for real world agents at a fraction of the cost As teams get exponentially larger monthly bills from the labs, it's worth exploring how many agents today perform just as well using Open Models Check out the numbers on external evals + try it yourself by dogfooding and running on internal evals - @OpenRouter and @ArtificialAnlys have great leaderboards and breakdowns of what people are using. The time investment is definitely worth the massive cost savings - Instead of Sonnet 4.6 (or even 5.5/Opus) try Kimi-2.6, GLM5.1, Deepseek v4 pro, etc - Instead of Haiku try DeepSeekv4 Flash, Nemotron, etc Open models require some tuning to make sure they work well in your harness for your task (another reason why open harnesses are important) The closed models are excellent, there's no need to full-scale rip them out. Often the first use of Open Models is as subagents or using a closed frontier model as an Advisor to an open driver model At LangChain we want to make it as easy as possible to build the best agents in the world as cheaply and quickly as possible. We're leaning into open models heavily across our products and libraries try out an open model in deepagents in just a couple lines and come ride the open model, open harness future

English

8

11

77

6.1K

Jeremy 🏰@SquidCorp_ink·4h

I'd really love to build my own harness to run my coding agents, but it feels so complex... Do guys have tips?

Viv@Vtrivedy10

we're on an Open Model mission to help builders create world class agents >20x cheaper than what they have today a couple things have become evident recently: 1. The age of the token subsidy is being pulled back 2. Open Models have crossed an intelligence threshold making them viable for real world agents at a fraction of the cost As teams get exponentially larger monthly bills from the labs, it's worth exploring how many agents today perform just as well using Open Models Check out the numbers on external evals + try it yourself by dogfooding and running on internal evals - @OpenRouter and @ArtificialAnlys have great leaderboards and breakdowns of what people are using. The time investment is definitely worth the massive cost savings - Instead of Sonnet 4.6 (or even 5.5/Opus) try Kimi-2.6, GLM5.1, Deepseek v4 pro, etc - Instead of Haiku try DeepSeekv4 Flash, Nemotron, etc Open models require some tuning to make sure they work well in your harness for your task (another reason why open harnesses are important) The closed models are excellent, there's no need to full-scale rip them out. Often the first use of Open Models is as subagents or using a closed frontier model as an Advisor to an open driver model At LangChain we want to make it as easy as possible to build the best agents in the world as cheaply and quickly as possible. We're leaning into open models heavily across our products and libraries try out an open model in deepagents in just a couple lines and come ride the open model, open harness future

English

0

15

Jeremy 🏰@SquidCorp_ink·6h

@unclebobmartin What language do you use to code? --> English, bad English.

English

1

0

1

185

Uncle Bob Martin@unclebobmartin·10h

Language wars are just so 2025.

English

35

5

219

13.2K

Jeremy 🏰@SquidCorp_ink·7h

@melvynx @sama Feels like my entire X feed has switched to codex 😅 and me as well.

English

0

267

Melvyn • Builder@melvynx·11h

Codex $100/month is the new $200/month Claude plan. Feel literally unlimited; you can do whatever you want, and you never feel the limits. I've been using it hard these past days, and I still have 68% of my weekly limits left 🤯 @sama is giving, take it

English

38

7

161

12.9K

Jeremy 🏰@SquidCorp_ink·11h

@AdityaShips I'd like too 😂

English

0

22

Aditya@AdityaShips·21h

Make a lot of money bro, The world is so cruel to poor people No one’s coming to save you.

English

30

20

293

4.8K

Jeremy 🏰@SquidCorp_ink·15h

@simonbrown A very boring day job I don't want to do. I prefer to act before bashing code tasks. But still thinking about how? Building and maintaining an orchestrator seems to be the way.

English

0

65

Simon Brown@simonbrown·15h

@SquidCorp_ink Are you sure? Some AI changesets are huge … potentially requiring more reviewers! And yes, that will be a very boring day job. 😄

English

1

0

2

172

Simon Brown@simonbrown·16h

Prompting + human code review only works if the humans have technical knowledge, which will likely decline if humans are no longer writing code. The devs of today will become the ivory tower architects of tomorrow.

Mo@atmoio

the future of software engineering seems uncontroversially prompting + code review. startups will skip the code review because they’re racing against time. larger/serious orgs will take code review very seriously. llms can do code review, but my guess is that because they have to search through large space, it will be as expensive to have say mythos review your code as it would be to have a senior dev. based on budget: $: prompting only $$: low grade llm review $$$: mid grade llm + dev review $$$$: high grade llm + sr dev review btw, software (past the bootstrapping phase) will get more expensive to make and take more time. quality will remain exactly the same as when humans were doing it: shit.

English

14

6

110

7.3K

Jeremy 🏰@SquidCorp_ink·16h

Give agents an API key with “read/write” scope and you’ve handed it the keys to the kingdom. 🗝️🏰 + Agents accumulating permissions they never needed + Compromised agents continuing to operate with valid credentials for days + Impossible-to-audit “who did what” Traditional authentication was never designed for this...

English

0

14

Jeremy 🏰@SquidCorp_ink·1d

Devs that are using @PiChangelog , what's your best addition or tips? I'd like to use it for my work but it feels like it's a lot of time to set up.

English

0

9

Jeremy 🏰@SquidCorp_ink·1d

@icanvardar In the same category, the guy who has been laid off comes back a year later in that same company. LinkedIn post: "I am so glad to join company X again for new challenges" What challenge? Not getting fired?

English

0

33

Can Vardar@icanvardar·1d

someone please nuke linkedin already

English

10

7

64

9.4K

Jeremy 🏰@SquidCorp_ink·1d

@godofprompt Me watching the AI platform building an Agentic customer service facing real customers without any deterministic gate in the workflow.

GIF

English

0

3

208

God of Prompt@godofprompt·1d

This is the most important post about AI agents written this year. And almost nobody building with agents right now will read it. Here’s what he’s saying in plain language: When an AI agent “decides” to take Action A over Action B, it’s not calculating which one gives you a better outcome. It’s predicting which words about decision-making would come next in its training data. It’s not thinking. It’s performing a simulation of thinking. For simple tasks, the performance is convincing enough to be useful. Summarize this document. Draft this email. Fix this bug. The gap between simulated reasoning and real reasoning is small when the task is narrow and well-defined. For complex, open-ended problems, the gap becomes a cliff. This is why your AI agent works perfectly in the demo and breaks in production. Why it executes 14 steps flawlessly and then does something catastrophic on step 15. Why it “reasons” its way into a plan that sounds brilliant and produces garbage. The agent isn’t broken. It was never reasoning in the first place. You were watching pattern completion that looked like reasoning. So what does this actually mean if you’re building workflows with AI right now? It means the human in the loop isn’t optional. It’s structural. You are the rational agent. The AI is the execution layer. You define the expected utility. You evaluate whether the output actually serves your goal. You catch the moment when fluent text diverges from useful action. Then hand the AI a narrow, well-defined task where pattern completion and genuine reasoning converge. That’s not a limitation. That’s the entire architecture. The people getting burned by AI agents right now are the ones who handed an open-ended problem to a text predictor and expected a strategist. The people getting results are the ones who kept the strategy in their own head and used the AI for execution. LLMs don’t think. You do.

BURKOV@burkov

If you don't understand this, you will not understand why LLM-based agents are irreparably failing for a general-purpose problem solving. An agent (by the way it was the topic of my PhD 20 years ago) to be useful, must be rational. Being rational means to always prefer an outcome that results in the maximal expected utility to its master/user. Let’s say an agent has two actions they can execute in an environment: a_1 and a_2. If the agent can predict that a_1 gives its user an expected utility of 10, and a_2 gives an expected utility of -100, then a rational agent must choose a_1 even if choosing a_2 seems like a better option when explained in words. The numbers 10 and -100 can be obtained by summing the products of all possible outcomes for each action and their likelihoods. Now here is the problem with LLM-based agents. The LLM is not optimizing expected utility in the environment. It is optimizing the next token, conditioned on a prompt, a context window, and a training distribution full of examples of what helpful answers are supposed to look like. Those are not the same objective. So when we wrap an LLM in a loop and call it an “agent,” we have not created a rational decision-maker. We have created a text generator that can imitate the surface form of deliberation. It may say things like: “I should compare the expected outcomes.” “The best action is probably a_1.” “I will now execute the optimal plan.” But the internal mechanism is not selecting actions by maximizing the user’s expected utility. It is generating a continuation that is statistically appropriate given the prompt and prior context. This distinction matters enormously. For narrow tasks, the imitation can be good enough. If the environment is constrained, the actions are simple, and the success criteria are close to patterns seen in training, the system can appear agentic. But for general-purpose problem solving, the gap becomes fatal. A rational agent needs stable preferences, calibrated beliefs, causal models of the world, the ability to evaluate consequences, and the discipline to choose the action with maximal expected utility even when that action is boring, non-linguistic, or unlike the examples in its training data. An LLM-based agent has none of that by default. It has fluency. It has pattern completion. It has a remarkable ability to compress and recombine human text. But fluency is not rationality, and a plausible plan is not an expected-utility calculation. This is why these systems so often fail in strange, brittle, and irreparable ways when given open-ended responsibility. They are not failing because the prompts are insufficiently clever. They are failing because we are asking a simulator of rational agency to be a rational agent.

English

36

59

357

46.5K

Jeremy 🏰@SquidCorp_ink·1d

The company I am working for is still at level 0 🥸 Management giving a speech about AI adoption. Few people have knowledge about AI. In the end AI lives in all minds but does nothing. What about your company?

Ann Miura-Ko 🦖@annimaniac

x.com/i/article/2049…

English

0

14

Jeremy 🏰 retweetledi

ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️@DanielMiessler·2d

x.com/i/article/2050…

ZXX

60

129

864

508.9K

Jeremy 🏰@SquidCorp_ink·1d

@tobi UCP is an awesome improvement in standardization but it misses a point. A workflow from catalog discovery to payment fulfillment cannot be done by only one agent! How do you manage multiple agents and who can do what?

English

0

567

tobi lutke@tobi·1d

Now that the whole industry is united behind UCP, it’s worth re-reading how we built the protocol specifically to put merchants in charge of their checkout and commerce. Building the Universal Commerce Protocol shopify.engineering/ucp

English

20

51

495

39.7K

Jeremy 🏰@SquidCorp_ink·1d

@simonbrown PO uses AI to write high-level requirements docs Tech team uses AI to split that document into small technical tasks during refinement. Tech tickets are processed by AI. AI summarize achieved work Repeat 🔁

English

0

1

70

Simon Brown@simonbrown·1d

Spec-driven development makes very little sense to me. The software development industry has repeatedly shown that devs don't like writing docs, often saying "it's tedious and time-consuming; I'd rather be coding". - How will this turn out to be different? - Why automate the fun part (coding) and force devs to write docs instead? developer.microsoft.com/blog/spec-driv…

English

102

41

419

59K

Jeremy 🏰@SquidCorp_ink·1d

@BartBurggraaf Why should both parents work? When you have a family, you have to choose between two paths: - Stay home to take care of the kids. - Work to pay for the daycare to take care of the kids. Obvious...

English

0

34

Bart Burggraaf@BartBurggraaf·1d

Nederland werkt niet (meer). Bijna de helft van 20- tot 64-jarigen werkt niet of in deeltijd. Geen EU-land scoort slechter. Ondertussen schreeuwen zorg, onderwijs en politie om personeel. Logisch dat mensen minder werken: van elke extra euro houdt een middeninkomen vaak minder dan de helft over. Verlaag de lasten op arbeid. Dan gaan mensen vanzelf weer meer uren maken.

Nederlands

359

337

1.5K

127K

Jeremy 🏰@SquidCorp_ink·1d

@levelsio Europe is full of frustrated people with ideas with a limited range of action. Bureaucracy breaks the velocity. Look at Mistral... Their brand new model is 6 months behind American and Chinese ones.

English

1

0

10

457

@levelsio@levelsio·1d

Europe has some of the best talent worldwide + Europe has the worst business climate

Tonino Catapano (tonnoz)@tonnoz

spoke again about this yesterday at dinner. Most indie builders I know are europeans. Almost none of them have their company/residency setup in the EU @euacc

English

100

39

1.3K

88.5K

Jeremy 🏰@SquidCorp_ink·1d

I think the last skills you created to align AI with team habits could help in this case. I made a skill for our team that highlights, whether they're bad or good practices, the practices in our brownfield project. For example, never use a pinia store for customer data because we have those customer details in a global state. And what if a colleague is still not using AI and reviewing code like it's 2020?

English

0

1

7

2K

Matt Pocock@mattpocockuk·1d

What do you do if someone on your team is using AI negligently? I.e. not reviewing, not caring, leaning into the slop. This, of course, was a problem pre-AI. But the "code is cheap" mind virus is making it worse IMO.

English

169

37

1.2K

92K

Jeremy 🏰@SquidCorp_ink·1d

I am not a fan of GUI for agents either but what if we want to extend to non tech people? As a dev I like to work in my terminal, but a TUI for a Product Owner sounds weird. We want to include our PO in our flow. An agent that transforms meeting notes into a business document the team can split later into agent tasks. I am not sure if a PO knows how to open a terminal 🥸 I'd be curious to know your vision on that.

English

0

43

Justin Schroeder@jpschroeder·1d

GUIs are good, but so are TUIs. There are plenty of examples of failed GUI dev tools that ultimately were better as a TUI. I dont want a gui for pnpm or vite. Not saying agents are one of those tools but it’s not obvious (yet) that *coding* agents are going to be better as a GUI.

David Cramer@zeeg

TUIs are not good sorry yall a CLI is a utility, and situational. this should not be confused with stuffing a full interactive GUI into a low capability platform. "lets ignore all the great UI technology of the last 20 years and build some caveman shit"

English

4

1

5

1.4K

Jeremy 🏰

Keşfet