John Davenport

41 posts

John Davenport

@johns10d

Toronto, Ontario, Canada เข้าร่วม Ekim 2016

49 กำลังติดตาม6 ผู้ติดตาม

@sam_hatoum I was playing with a complex generated feature. Vega lite manual editor, llm chat that modifies the json document, and a rendering of the chart. Trying to work at the bdd spec level to produce a working ui. Every iteration the model shortcuts around making it fully work.

English

Sam Hatoum@sam_hatoum·5h

@johns10d What's the spec that you're modifying?

English

Sam Hatoum@sam_hatoum·1d

Everyone fixates on "test" in Test-Driven Development. The important word is "driven." TDD was never a testing technique. It was a design technique — recording decisions as executable expectations. Post 8 of 20 in The Spec-Driven Shift series. specdriven.com/perspectives/t… Repost from @beonauto.

English

John Davenport@johns10d·5h

@sam_hatoum Dude if you thought the term spec was overloaded before. Fuggedaboutit it’s about to get a lot worse. In this case it’s just a json document defining how a visualization renders. vega.github.io/vega-lite/docs…

English

John Davenport@johns10d·5h

@automate_archit @anitakirkovska It’s not complicated. I sat with the agent to make a strategy. I do the strategy every day. I change it if I find something good. I add a tool if I need it. That’s it.

English

anita@anitakirkovska·1d

Marketers with Claude Code who think like engineers are about to print money.

English

864

32.4K

John Davenport@johns10d·7h

@ShadesofSamsara @melvynx I think any version of building your own harness is going to be beneficial for you.

English

Shades of Samsara@ShadesofSamsara·10h

Yesterday I started using the remainder of my claude-code max5 subscription to create my own harness by copying most of what OpenClaw does and blending it with my own workflow dashboard I built for myself and my business. So far its working by building it all on top of CLI -p commands. Using my subscription with their tools, to follow their rules, to make my own wrapper that does what I want it to do. So far so good, just waiting on current session to reset to keep debugging some of the feature issues from the fresh build.

English

240

Melvyn • Builder@melvynx·1d

Day 3 with OpenClaw: In all my tests, GPT 5.4 is consistently the worst model for agentic tasks. Lazy, stupid, never follows anything, feels like you are a baby sitter. I don't know how OpenAI manages to make such a shitty model but this feels terrible. I miss Opus.

English

187

392

39.6K

John Davenport@johns10d·8h

@WeberBuilds But Anthropic didn't build a harness. They built an environment for a harness. You still have to write your stop hook and what it's going to do when the stop hook fires. You still have to write your skills and progressive disclosure logic.

English

Michael Weber@WeberBuilds·12h

Claude Managed Agents (probably) ended the "build your own harness vs framework" debate yesterday. Building and running your own custom multi-agent system is now basically "why am I doing this?" Scary times. What's the next layer to collapse?

English

John Davenport@johns10d·8h

@Trader_XO @trader1sz Specs are an effective way to plot and execute medium to long horizon development tasks, especially when you want to exert control over low-level engineering decisions. I've come to understand it's just part of the puzzle and may not always be applicable.

English

XO@Trader_XO·9h

@trader1sz A solid win today: I get Claude Code to generate my specs and tasks. Upon completion it automatically launches a [codex exec] workflow to spin up Codex agents for the tests and implementation...

English

105

14.8K

John Davenport@johns10d·11h

@ks458008 @everythingLLM Exactly this. It’s digital process engineering or something.

English

Ken@ks458008·12h

@johns10d @everythingLLM Exactly my point. Hooks, test gates, verification loops — none of that is "just prompting." When the discipline requires procedural thinking, we need a new name for it.

English

Ken@ks458008·1d

Prompt Engineering is over. The next era is Harness Engineering.

English

John Davenport@johns10d·11h

@NotRiteQuite @virtualunc @romxdev ? @NotRiteQuite im a software engineer writing a procedural harness. I don’t know if your mother ever taught you “if you don’t have anything nice to say, don’t say anything at all.” If you’re going to talk shit in the internet, know who you’re talking to.

English

Not Right@NotRiteQuite·23h

@johns10d @virtualunc @romxdev That's a waste of context, you should just learn to write code and use code to constrain. I don't know how so many of you are missing this. You got a machine to automatically write code and you're writing it a diary. Morons.

English

Roman@romxdev·3d

vibe coding is officially dead I had to say it. we thought AI would let us relax and code "on chill", but instead it turned us into architectural bureaucrats. we write strict laws, define rules, limits, and principles. if you don't obsessively review the code agent writes, your project will mutate into a massive landfill of tech debt within a month.

English

319

189

2.3K

287.2K

John Davenport@johns10d·11h

@sam_hatoum I was playing with a complex generated feature. Vega lite manual editor, llm chat that modifies the spec, and a rendering of the chart. Trying to work at the spec level to produce a working ui. Every iteration the model shortcuts around making it fully work.

English

Sam Hatoum@sam_hatoum·1d

@johns10d Me too, executable specs make all the difference. It was bad enough with humans causing regressions, AI is atrocious at it!

English

John Davenport@johns10d·11h

@dbmarkley I can tell you how I do it. I wrote my harness in elixir. It lands as a web server that’s compiled into a binary and sits inside the Claude plugin. Partly because I’m wary about showing my guts to the anthropic(s)

English

David Markley@dbmarkley·1d

💯 The stack I built is specific to how I operate as a PM running multiple projects. The orchestrator, knowledge architecture, and review pipeline all emerged from my specific constraints. Anthropic's Managed Agents will handle the generic infrastructure better than I ever could. The question I'm most interested in is: what are the domain-specific patterns that sit on top of that infrastructure? The committee design, the verification fidelity spectrum, the knowledge routing. Unclear how we share those templates at scale as I don't think they get commoditized by Anthropic/OpenAI

English

David Markley@dbmarkley·1d

x.com/i/article/2041…

ZXX

386

John Davenport@johns10d·11h

@AbdelStark @AnthropicAI My harness currently uses bdd tests, specs, unit tests and code. I want to experiment with cutting specs and unit tests out.

English

abdel@AbdelStark·1d

@johns10d @AnthropicAI Thanks. Ah yeah totally. In my opinion TDD + BDD is really a banger combo in the agentic era.

abdel@AbdelStark

I will show it in practice. How well can DDD + BDD + TDD can fit naturally with the agentic world and lead to much better outcome in terms of producing actual production grade software and not slop.

English

abdel@AbdelStark·1d

Introducing claude-md-compiler: Compile structured Claude Code workflow policy into versioned artifacts and enforce it against runtime evidence, hooks, and git diffs. What is it about ? Basically the problem statement comes naturally from @AnthropicAI documentation about how Claude Code treats your Claude.md file: "Claude treats them as context, not enforced configuration." This is the gap I tried to close with this project. To enable creating actual policies that can be enforced and deterministic. I think it can be very useful for CI checks and strong guarantees, to make sure the harness will enforce some invariants and rules, instead of depending on the LLM inference, that can tend to hallucination and skip the boundaries / rules / invariants. Repo is open source: github.com/AbdelStark/cla…

English

2.5K

John Davenport@johns10d·1d

@RLanceMartin This is a great feature that's going to help a lot of people build out harnesses for accomplishing really complex tasks.

English

121

Lance Martin@RLanceMartin·1d

a new addition to the Claude API: Claude Managed Agents. you supply the agent configuration and task, we handle to agent harness + managed infrastructure to ensure it works reliably over long task horizons. x.com/RLanceMartin/s…

Lance Martin@RLanceMartin

x.com/i/article/2041…

English

174

36K

John Davenport@johns10d·1d

@dejno Managed Agents is really just part of the solution though. Maybe people smarter than me will figure out how to make really effective general purpose harnesses, but I've found that being very specific leads to more effective outcomes.

English

Jake Dejno@dejno·1d

Three things I believe about agents in prod: - The agent harness matters as much as the model. - Orchestration shouldn't be every team's problem. - Teams that win spend their time on product, not infra. Claude Managed Agents is our answer to all three, and now it’s in Public beta

Claude@claudeai

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English

124

John Davenport@johns10d·1d

@ashmaurya In business if you start blaming the person that's a problem. You blame the process. The vibe coding crisis isn't about the model. It's about the process, and you can only improve that through experimentation.

English

Ash Maurya@ashmaurya·1d

The vibe coding crisis isn't about code quality. Everyone's debating bugs, security flaws, "worst software crisis" headlines. The real crisis: non-technical founders can now ship bad ideas at unprecedented speed. Faster failure is still failure. The fix isn't better AI. It's better experiments.

English

297

John Davenport@johns10d·1d

@JWallaceParker If you start improving your harness to address the problem every time you find yourself saying "X is where the work happens," you'll be moving in the right direction.

English

Joe Parker@JWallaceParker·1d

Claude Code workflow for a new feature: write the spec section, point Claude Code at it, review the diff, reject what's wrong, accept what's right, repeat. The review is where the work happens.

English

John Davenport@johns10d·1d

@everythingLLM @ks458008 It's a combination of prompt engineering, validation, and orchestration. Harness engineering is an exercise in procedural coding as much as it is prompting. Hooks, test gates, and verification loops are code, not language.

English

Everything AI@everythingLLM·1d

@ks458008 Interesting framing. I'd push back though - harness engineering is really just prompt engineering at scale with better tooling. The underlying skill (guiding model behavior through language) hasn't changed, we've just wrapped it in fancier infrastructure.

English

John Davenport@johns10d·1d

@dbmarkley We're just moving up the stack here dude. I think the riches are in the niches. If you're broad, Anthropic or OpenAI is going to eat your lunch. The trick is to get really specific.

English

David Markley@dbmarkley·1d

In a hilarious twist of fate, Anthropic appears to have launched Managed Agents ~10 minutes after I posted about building this all from scratch. The timing is impeccable. I'll be reading their docs tonight to see how much of my orchestrator just became unnecessary. x.com/claudeai/statu…

English

ค้นพบ

@sam_hatoum @beonauto @automate_archit @anitakirkovska @ShadesofSamsara @melvynx @WeberBuilds @Trader_XO