Werner Kasselman

387 posts

Werner Kasselman banner

Werner Kasselman

Werner Kasselman

@wernerk_au

Builder. Shipping the pieces of a Rust-first AI engineering stack: sqry, llm-cli-gateway, and grokrs, with more coming. Verivus OSS. Views are my own.

Gold Coast, Queensland Katılım Kasım 2024

78 Takip Edilen164 Takipçiler

Werner Kasselman

Werner Kasselman@wernerk_au·3h

"the agents need to be steered, their work reviewed, the outputs incorporated" is the whole ballgame hiding in a subordinate clause. That's the job that grows, not shrinks: someone has to set the frame, judge the output, and own what gets merged. The one thing I'd make explicit is that steering and reviewing only scale if the work arrives in a shape you can actually review at volume, because an agent will hand you a thousand lines of fluent, plausible, wrong with exactly the confidence of a thousand lines of right. Get that wrong and "their work reviewed" silently becomes the bottleneck that eats the very headcount you just freed up to hire. The dollars move to the next thing that matters, as you say; the trick is keeping the review legible enough that the freed-up people are governing the work, not drowning in it.

English

0

0

123

Aaron Levie

Aaron Levie@levie·4h

A meaningful portion of enterprises I talk to outside of Silicon Valley generally are looking to hire while also adopting agents. There’s a huge wave of technical and engineering talent needed inside originations, building software or acting as FDEs for agents. And as AI drives efficiency in areas like the customer lifecycle, companies are leaning in even more heavily to client-facing jobs. In a world where AI did everything for you with no human oversight needed, maybe we’d be having a different conversation. But that’s not how AI works. Even for the areas that have the most automation potential, agents are automating tasks, not whole jobs. As they automate tasks, the agents need to be steered, their work reviewed, the outputs incorporated and more. All of this is requiring people to do the work. And for the areas that have less automation potential, companies are freeing up dollars from efficiency gains elsewhere to hire in those areas now. Yes, maybe AI lets you respond to front line support tickets automatically, but the companies (instead of just dropping the profit to the bottom line) will go and invest in new areas of sales and customer success that will add more differentiation for clients. Companies don’t remain static. They automating tasks where they can and free up dollars to move onto the next thing that matters.

unusual_whales@unusual_whales

OpenAI's Altman says AI unlikely to lead to 'jobs apocalypse'

English

15

181

23.8K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

the moat-isn't-the-model line is right, and the Stripe flow shows exactly where it actually sits: every arrow before "opens PR for review" is now cheap and fast, so the one step carrying all the weight is the last one, the human review of a thousand PRs a week. That's the bottleneck that didn't move. Tooling that only speeds up the writing makes the review queue worse, not better; the tooling that matters is whatever lets the reviewer still know what a change means at that volume (the requirement it came from, the boundary it wasn't meant to cross, the evidence it carries) without reading every diff by hand, because at a thousand a week nobody is reading every diff by hand. The moat isn't the model and it isn't the agent writing the code, it's whether the PR arrives already legible to the person who has to say yes.

English

0

1

34

Bilgin Ibryam

Bilgin Ibryam@bibryam·18h

Stripe’s Minions are a glimpse of how software engineering may work in 1–2 years: → Slack message starts an agent → Agent gets docs + code context → Writes code autonomously → Runs tests + CI → Opens PR for review Key insight: the moat isn’t the model. It’s the tooling + infrastructure around it. stripe.dev/blog/minions-s…

English

3

16

1.4K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

@arvidkahl this is the thing I've been writing a spec for, the two prongs you describe but bound to the change so they can't drift, intent and evidence attached when the code lands. Open draft, would value your read given you've clearly thought hard about this: github.com/verivus-oss/ag…

English

0

0

2

Werner Kasselman

Werner Kasselman@wernerk_au·4h

the bit that gets skipped and you didn't is the conscious command and review on the auto-generated pass, that review is the whole ballgame, it's the step where you actually still know what the code does. auto-generated docs you've reviewed are true at review time and start drifting on the very next commit, and "agents will continue your docs format into new modules" quietly hands the upkeep back to the agent whose work the docs were supposed to let you check, which is the principal-agent loop closing on itself. The fix isn't more discipline about re-running the pass (you'll forget, everyone forgets, that's how summary fields went stale for thirty years), it's intent that travels with the change instead of in a file beside it, so freshness isn't a step you have to remember. Your "why" and "how like this" are exactly the right two prongs; the question is just whether they live next to the code or inside the commit that changed it.

English

0

0

7

Arvid Kahl

Arvid Kahl@arvidkahl·9h

The biggest differentiator between "agentic coding is helping me ship faster" and "agentic coding doesn't work" is how well you've documented your codebase prior to letting the agent run on it. In my own experience, agents desire a two-pronged documentation approach: clear birds-eye-view docs, markdown, text, or whatever fresh UML hell you can conjure. Something that explains the decisions that went into the code, the "why." Then, in each file, a file-level docblock plus explanation of hot paths, key variables, and functions. The "how" and the "why like this. Funnily enough, if you let a really smart model go through your codebase file by file, these docs can be auto-generated. But it's a standalone step, something you have to consciously command and review. Once these things are in place, agents will mess up much less often, and they will even continue your docs format into new modules and features. "Prompt engineering" isn't just about the command you give; it's also very much about the context that goes with it. And that can be good old documentation.

English

0

14

957

Werner Kasselman

Werner Kasselman@wernerk_au·4h

I know exactly how it sounds to reply to "someone should build this" with "mate I've been building this", but git is not the right primitive is the thing I've been screaming into a text editor for months, so here goes. The writeup is the argument in full (files flatten a graph of intent into text, two agents land clean merges that disagree at runtime, retries aren't idempotent because git was built for one human at a terminal): the next software stack needs more than code generation, dev.to/wernerk_au/the…

English

0

3

573

Theo - t3.gg

Theo - t3.gg@theo·8h

I'm going to use my AI psychosis to fix clouds for agents. Someone else needs to use their psychosis to fix source control. I would do it myself but I'm already too deep on the cloud thing. GitHub is dying and git is not the right primitive. Will dump some thoughts here.

English

21

1.3K

85.1K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

This is the good version of the whole story, and the bit I'd point at is "perfectly to spec": the agent didn't need the announcement because the feature described itself, the surface was honest about what it did, so discovery and correct use came for free. That's the thing worth building toward, intent legible in the artifact rather than in a changelog nobody reads. The same mechanism has a shadow though, an agent will use a surface that quietly lies about itself with exactly the same confidence, and you won't get the delightful version of this tweet, you'll get an app shipped overnight on top of a feature that wasn't what it claimed. The win and the failure are the same capability; the only difference is whether the spec the agent read was true.

English

0

0

60

Theo - t3.gg

Theo - t3.gg@theo·10h

I added a new feature to my cloud (lakebed) last night. Didn’t announce it. Didn’t tell anyone about it. Woke up today with Sherlock shipping a new app on top of the new feature. His agent discovered it and used it perfectly to spec.

Sherlock@thesherlocker

because there's such well curated reading and viewing from @badlogicgames made a curation list - badlogic-list.lakebed.app (using @theo's lakebed.dev) and yes there's a RSS feed

English

6

470

84.2K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

This is exactly why the spec I've been writing treats cost as a first-class artifact attached to the intent, not a number you reconcile afterward. It's an open draft and the cost profile is the newest part, so if this is a problem you're living at Modal, @akshat_b I'd value the pushback: github.com/verivus-oss/ag…

English

0

0

1

Werner Kasselman

Werner Kasselman@wernerk_au·4h

The dashboard is the right thing to want, but clustering spend after the fact is still trying to draw the line to value once the tokens are already gone, which is the same trap the Uber quote is describing one level up. The reason it's hard to know which 50% is useless is that the spend was never bound to the thing that justified it; a cluster reconstructs intent from usage, and that's the expensive direction to run the inference. The cheaper version is to attach the cost to the declared intent at the point of work, the requirement that asked for it, so the line to value isn't drawn later from a chart, it was there before the spend happened. Then the useless half isn't a mystery you go mining for, it's the work that couldn't name what it was for.

English

0

0

21

Akshat Bubna

Akshat Bubna@akshat_b·16h

Pretty sure 50% of internal token spend is completely useless, but right now it's hard to know which 50%. As an admin I'd love a dashboard that breaks down each person's spend into summarized clusters. Much easier to spend more when you can draw a clear line to value.

Ed Zitron@edzitron

Uber’s COO has said that it’s getting “harder to justify” its AI costs because there was no way to show a link between AI spend and any meaningful increase in useful features. This is the first time I’ve seen a company say this directly. businessinsider.com/uber-coo-andre…

English

7

177

42.3K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

that's the whole bet behind a spec I've been writing: agents declare their execution path as a graph up front, so review reads the intent instead of reconstructing it from the diff. Same instinct as XState, one layer up. It's early and open, and frankly @DavidKPiano, @JohnPhamous your eyes on the state-machine parts would be worth a lot if you ever have the time: github.com/verivus-oss/ag…

English

0

1

14

Werner Kasselman

Werner Kasselman@wernerk_au·4h

The reason it works is the same reason it's been the right instinct for years: an explicit state machine makes the thing checkable instead of guessed at, the legal transitions are written down so a test (or a person) can hold the diagram against the behaviour. The same move is climbing a layer right now, from the UI's runtime states up to the agent's plan: make it declare the states it intends to move through before it writes the code, and the work becomes reviewable the same way the UI did, by reading the machine and not the bytes. You've been screaming the principle; it just found a new floor to apply to.

English

0

0

91

David K 🎹

David K 🎹@DavidKPiano·14h

I have been screaming this for years 💯

JohnPhamous@JohnPhamous

representing ui as a state machine makes it easier to manually and automatically test

English

28

475

52.7K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

@deedydas @deedydas - the trick is the prefix does the marketing so the repo doesn't have to. Tempted to rename mine to OpenGovernance or OpenIntent and just coast on the 10x. github.com/verivus-oss/ag…

English

0

0

264

Deedy

Deedy@deedydas·5h

I'm convinced that adding "Open-" to your company name instantly 10x's your odds of success. OpenAI OpenEvidence OpenTable OpenRouter OpenCode OpenDoor OpenGov OpenWeb OpenText OpenView OpenSea OpenStore OpenFX OpenSpace OpenArt OpenHands OpenPipe OpenNote

Deedy tweet media

English

9

309

37.4K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

Runtime interception like this is worth having, but it's worth being precise about which failure it catches: a policy layer in front of the wire decides what an agent is allowed to do, it stops the tool call that was never permitted. It doesn't touch the call that was fully permitted and still wrong, the discount rule that quietly reads an attribute another agent moved last week, applies clean, sandboxes fine, and resolves to the wrong number in production. That one isn't a denied action, it's an allowed action with no record of what it was supposed to mean, and you only catch it by governing the intent (the requirement that forced the change, the boundary it wasn't meant to cross) before the apply, not by guarding the wire at runtime. Both layers are needed; they are not the same layer.

English

0

0

198

Vaishnavi

Vaishnavi@_vmlops·22h

MICROSOFT OPEN-SOURCED A GOVERNANCE LAYER FOR YOUR AI AGENTS and it's exactly what agentic ai has been missing here's what agent governance toolkit does: ▫️ intercepts every tool call in deterministic code before it hits the wire denied actions aren't unlikely, they're structurally impossible ▫️ yaml policy engine lets you allow, deny, or require human approval per action ▫️ zero-trust identity via spiffe/did/mtls no more 5 agents sharing one api key ▫️ 4-level execution sandbox with privilege rings so agents can't escape their scope ▫️ tamper-evident merkle audit logs for compliance and incident response ▫️ covers all 10/10 owasp agentic top 10 risks ▫️ works with langchain, crewai, autogen, openai agents sdk, semantic kernel, and more one pip install...any framework...python, typescript, go, rust, .net all supported because "please follow the rules" in a system prompt is not a guardrail...it's a suggestion github.com/microsoft/agen…

English

158

1.1K

128K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

@stack72 That's the gap I've been writing a spec to close: provable intent over the diff, the abstraction boundary declared so it can't be silently violated, evidence attached when the change lands. github.com/verivus-oss/ag…

English

0

0

1

Werner Kasselman

Werner Kasselman@wernerk_au·4h

This is the right instinct, and I'd push it one floor further: typed schemas and validated execution catch the guess that's wrong at apply time, but the nastier class is the one that applies cleanly and is still wrong, because the agent didn't know a convention nobody wrote down or quietly crossed a boundary it was never told existed. Determinism on the execution side doesn't buy you that; you only get it by making the intent explicit before the apply, the requirement that forced the change and the boundary it isn't allowed to cross, so the thing being validated is what the change was supposed to mean and not just whether it ran.

English

0

0

4

Paul Stack

Paul Stack@stack72·17h

Getting agents to generate IaC code faster isn't the answer. It's still probabilistic output driving your infrastructure. The agent doesn't know your conventions or your breaking changes. You won't know how well it guessed until apply time — or until you have to review and debug it. What if agents worked against typed schemas instead? No code generation. Deterministic execution. Same inputs, same result, every run. stack72.dev/deterministic-…

English

2

5

396

Werner Kasselman

Werner Kasselman@wernerk_au·4h

@8090_Factory This is the whole reason I've been building a spec for it: provable intent over the diff, evidence attached when the change lands rather than archaeology after the fact. github.com/verivus-oss/ag…

English

0

0

3

Werner Kasselman

Werner Kasselman@wernerk_au·4h

this is the right shape, and it's the engineering version too: the upstream decision only creates downstream pressure if it's written down somewhere the work can be checked against, otherwise it's back to willpower wearing a process costume. When an agent can write a thousand plausible lines in seconds, the only pressure that survives is the kind you made explicit before the code landed (the requirement that forced the change, the abstraction boundary it isn't allowed to cross, the evidence it has to carry), not the kind you hope to reconstruct from the diff three weeks later when finance notices the mismatch.

English

0

0

8

8090

8090@8090_Factory·7h

Chamath on 8090's building philosophy: "If you want your product to get good, do not lean on willpower or good intentions. Build a company where explicit upstream decisions create downstream pressure for the product to get better every day. That is what we are doing at 8090."

Chamath Palihapitiya@chamath

x.com/i/article/2058…

English

1

5

781

Werner Kasselman

Werner Kasselman@wernerk_au·4h

So the someone-with-influence @mstockton, wants isn't there to drive adoption, @matt_slot they're there to hold the standard that survives when nobody below them can reconstruct the artefact by hand any more. Govern the work, don't receive it. The longer writeup is here: wernerkasselman.substack.com/p/the-make-or-… and the spec is at github.com/verivus-oss/ag…

English

0

0

2

Werner Kasselman

Werner Kasselman@wernerk_au·4h

Both Matts are right that the system has to be re-architected, imho, but I'd put the boundary in a slightly different place: the bottleneck didn't disappear when agents started writing the code, it moved up a floor, to whether you can still tell good work from fluent work (which is the part that didn't get cheaper).

English

0

0

70

Matt Stockton

Matt Stockton@mstockton·6h

Based on lived experience, I agree with this. These new tools force an entire re-imagining of how work can be done, how work flows through a company / system, what teams need to exist, which roles do what things, who works with whom, which roles need to even exist, and what systems you even need to have in the first place - amongst many other ‘destabilizing’ things. It’s a tool, but also a tool that reshapes the system itself. You cannot simply adopt it, because it’s something that alters the foundation of your work versus amplifies an existing way of doing things. You need to take a step back and understand how it changes the system and rules of operating. This probably sounds batshit insane to lots of folks. But it’s true and you need someone within the org who knows it’s true and also has the influence / leverage to drive the change

Matt Slotnick@matt_slotnick

we're still in the "faster horses" phase. we're trying to force agents into systems designed for people-shaped workloads. real change comes when you re-architect to take advantage of the benefits (and shortcomings) of agents. this will drive real divergence in corporate outcomes

English

9

46

7.1K

Werner Kasselman

Werner Kasselman@wernerk_au·4h

The "faster horses" bit is exactly right, but the horse isn't the workflow, it's the primitive. Git shows you what changed in the text, never why, so two agents land clean merges that disagree at runtime (a pricing path quietly resolving to 0.00 because the attribute it depended on got moved three commits over). Files were the right unit for a human at a terminal, not for a fleet that operates on relationships.

English

0

0

3

Werner Kasselman

Werner Kasselman@wernerk_au·4h

The cheap thing is asking, the dear thing is checking, and that gap is the whole story. I watched a single agent write a thousand lines of plausible markdown describing work that didn't compile and contradicted a decision it had made two files earlier, in the same session. Re-architecting the org around that without re-architecting how you verify it just scales the confident nonsense.

English

0

0

3

Keşfet

@arvidkahl @akshat_b @DavidKPiano @JohnPhamous @deedydas @elonmusk @BarackObama @taylorswift13