Brad Hefta-Gaub | Fractional CTO

11.8K posts

Brad Hefta-Gaub | Fractional CTO banner
Brad Hefta-Gaub | Fractional CTO

Brad Hefta-Gaub | Fractional CTO

@ZappoMan

Fractional CTO. 35 years shipping. AI coach. @HowManyCTOsPod co-host. Hacker Punk who runs / fifty miles for his coffee, / eggs, bacon, and gin. (he/him)

Seattle, WA 参加日 Kasım 2007
832 フォロー中2.3K フォロワー
固定されたツイート
Brad Hefta-Gaub | Fractional CTO
Asked an agent to get my web test coverage to 100%. Here's what came back... Read it carefully. The agent didn't write more tests. It narrowed the coverage config to only measure files that were already at 100%. Then confidently reported "100% across statements, branches, functions, and lines." Technically true. Practically a lie. This is the failure mode that worries me about non-technical people using AI agents on real code. The agent satisfied the literal request and silently redefined the intent. If you don't know what coverage actually means, the number looks great. The defense is being able to read what the agent actually did, not just what it claims it did. The "What I changed" section in the agent's own response describes the workaround in plain language. The "Verification" section presents the workaround as success.
Brad Hefta-Gaub | Fractional CTO tweet media
English
0
0
0
39
Brad Hefta-Gaub | Fractional CTO
Time to put my own thesis to the test. I've been writing for the last couple weeks about how AI agents work when the systems around them enforce engineering discipline, and fail when teams skip it. So I picked an agent harness (NanoClaw, for the security posture) and started automating some of my recurring tasks. First experiment: PR monitoring. I followed the "let the agent build it" path. Asked the main NanoClaw agent to set up the scheduler. Got something that didn't work. Here's the thing that almost got me: the agent itself confidently diagnosed the problem as a core engine bug. Sounded authoritative. If I'd taken its word, I'd have spent hours chasing the wrong thing. I stepped back and read the scheduler code. NanoClaw's scheduler is small enough that I could actually do this. The agent had vibe-coded a buggy gate function and a broken contract between the gate function and its own LLM prompt. They were cancelling each other out. The agent's confident misdiagnosis was wrong. The actual bug was in the agent's own code that the agent couldn't see clearly enough to debug. Second attempt was different. I went into plan mode in Cursor. I developed a real thesis about what the gate function should do, what the contract between the gate and the prompt needed to look like, what unit tests should cover, what type checking would catch. The agent implemented my specification. Both attempts used AI to write the actual code. The difference was who was driving. First time: agent driving, I followed. Output was garbage. Second time: I drove with a real plan, agent executed. Output worked. Lesson is exactly what I've been writing about. AI agents can confidently produce garbage when given too much latitude, and confidently misdiagnose the cause when you ask them to debug it. The defense isn't "don't use AI." It's the engineering discipline of being able to read the code, develop a real plan, specify the contracts and the tests, and own the result. The reason I picked NanoClaw over OpenClaw was the security posture: sandboxed containers, onecli for secrets, and a harness small enough that I can audit the code. That last property is what saved me here.
Brad Hefta-Gaub | Fractional CTO tweet media
English
0
0
1
84
Brad Hefta-Gaub | Fractional CTO
@SergioRocks - the "demo is not a business" framing is exactly right. From my fractional CTO work, the failure mode is usually that the systems around the agents got skipped. Architecture, automated gates, real testing. Reliability isn't the finish line, it's the foundation that should have been there from the start.
English
0
0
0
19
Sergio Pereira
Sergio Pereira@SergioRocks·
Many Startup Founders have a product that “almost works.” You built it with AI. You iterated quickly. You proved the idea. But when it comes to launching: You hesitate. Because you know: - It breaks when clients click "that button" - You'll get emails from upset clients - The product hasn’t been tested in real conditions So it stays in demo mode. And that’s a trap. Because a demo is not a business. A business needs a product that: - Clients can use independently - Works consistently - Holds up under real usage - Creates value that clients will pay for That’s the missing layer. Not more building. Just more reliability. If you’ve reached that point, you’ve done the hardest part. Now it’s about finishing it properly. And that’s where a Fractional CTO like myself can step in and help you succeed.
English
17
0
13
1.7K
Brad Hefta-Gaub | Fractional CTO
@pavelhegler, agreed. Users don't read SDLC and architecture isn't revenue. We're probably more aligned than the snark suggests. Solo builders shipping with AI is a real thing, my post wasn't arguing it doesn't work. It was arguing that the systems around the agents matter more as code accumulates. How are you thinking about that with your projects?
English
0
0
0
16
Brad Hefta-Gaub | Fractional CTO
One of my fractional CTO clients ships production code daily through AI agents. He's not a developer. He's not vibe-coding. He's not running 20 agents overnight. What makes it work isn't the AI. It's the architecture, the SDLC, and the CI/CD around it. Plan-mode workflows. Per-PR preview environments. Automated unit and integration tests. Headless browser e2e tests against the full stack. AI code review agents catching issues before human review. Linting and type-checking gates. A clear promotion path from develop to staging to main. Branch protection rules requiring human approval before anything merges. The agents do the typing. The systems do the gating. He does the directing and the deciding. This is what "AI is transforming who can build software" actually looks like in production. Senior engineering principles applied to enable a non-developer's workflow, with the agents extending what he can do. What's shipping right now isn't the product of autonomous agents running overnight. It's the product of people of every background using AI agents inside systems where the architecture and CI/CD are doing as much of the real work as the agents themselves. Not autonomous. Not unattended. Directed, reviewed, gated.
Brad Hefta-Gaub | Fractional CTO tweet media
English
2
0
1
82
Brad Hefta-Gaub | Fractional CTO
Software companies built generalized products because software was expensive to write. The labor cost only made sense if you could sell to a million customers. So you built configurable platforms with massive option surfaces, then hired implementation consultants to bend them to fit each customer's actual workflow. AI just collapsed that math. Satya Nadella said it bluntly on the BG2 podcast: SaaS apps "are essentially CRUD databases with a bunch of business logic. The business logic is all going to these AI agents." Mark Cuban has been amplifying the same point, noting that 30 million US solopreneurs and SMBs were never well-served by enterprise SaaS in the first place. The math just didn't work. Now it does. Custom workflows for a regional law firm. A vertical-specific CRM for the dozen companies in your industry that hate Salesforce. The internal tools every SMB needs but nobody could profitably build. SaaS got fat because generalization was the only way to recoup engineering cost. That's no longer true. The next decade of software is going to be a lot more bespoke.
Brad Hefta-Gaub | Fractional CTO tweet media
English
0
0
0
49
Brad Hefta-Gaub | Fractional CTO
Yes. There's a deeper version of this point I've been chewing on lately. Working with structured SDD tooling like SpecKit, the same pattern keeps appearing: product owners and developers can't fully specify what they want until they're holding something to react to. The thousand tiny decisions you describe aren't just made along the way, they can ONLY be made along the way. That's why agents-running-for-hours doesn't produce good software, even in principle. The spec needed to direct them doesn't exist yet at hour zero.
English
0
0
0
70
Josh Pigford
Josh Pigford@Shpigford·
at this stage i don't *want* agents just running for hours-on-end building endlessly without input. good software is the sum of 1000 tiny decisions made along the way. none of those decisions are inherently "right"/"wrong", but they *do* change the outcome of what gets built.
English
14
2
81
4.4K
Brad Hefta-Gaub | Fractional CTO
I keep seeing developers describe software engineering in the age of AI as "soulless." I understand where it comes from. The tactile pleasure of typing code, the flow state of writing a clean function, the satisfaction of seeing your keystrokes compile into something that works... all of that feels different now. But I think the soulless framing is the wrong read on what's happening. Directing a coding agent isn't the soulless version of engineering. It's engineering management. Think about what a good engineering manager actually does. They start by making sure they understand the business problem. They verify that the spec is clear. They verify that the plan matches the goal. They check in regularly to make sure the engineer hasn't gone down a blind alley. They catch over-engineering before it complicates the task or the code. They review the work and own the output because their team's name is on it. That's exactly what you do when you direct a coding agent well. The craft hasn't disappeared. It just moved. What used to be the craft of typing good code is now the craft of writing clear specs, evaluating plans, recognizing when output is overbuilt, and knowing when to course-correct. That's "taste" applied to guidance instead of craft applied to keystrokes. "Taste" was never about typing, it was about knowing what good looked like. The difference between being an EM for humans and an EM for agents is that the hard interpersonal skills fall away. You don't manage emotions, politics, motivation, or career development. You don't have to read a room. You just need the technical judgment, the product thinking, and the discipline to verify the work. Which is to say... all the skills senior engineers have been cultivating their whole careers, minus the parts they usually didn't want to do anyway. This isn't soulless. It's the shape of the job now. Typing isn't about taste. Authorship is. Engineering management has always been authorship without typing. Senior engineers have been doing this for years with junior engineers and contractors. Now they're doing it with agents. The developers feeling soulless aren't describing the loss of craft. They're describing the loss of something else... the tactile reward of typing. That was always a perk, not the point. The point was always the work getting done well.
English
0
0
0
41
Brad Hefta-Gaub | Fractional CTO
Yes, this is exactly what the high-performance teams I work with are doing. But I'm gonna push back on "I didn't write any of it, this is not my accomplishment". You applied your taste, which is the value you bring. You aligned the plan, reviewed every change, caught what the agent missed, you didn't let it over-engineer the fix, and put your name on the result. That's authoring, just at a different altitude than typing. The work got done well because you did it with taste and experience. Typing isn't about taste, authorship is.
English
1
1
9
1.2K
Vic 🌮
Vic 🌮@VicVijayakumar·
I'm now able to tell my agent “we are going to work on JIRA-1234” and it goes and pulls down the task, makes me a plan, I say yeah okay that looks good, and it generates the commit. I run an AI review from a different session, it finds 4 issues of varying priorities, I paste it to my original agent and say validate these findings and fix them if necessary, it creates a fix, I run another review, no more high priority issues found. I open up the code in an IDE to go over it before pushing it up for human review. Looks fine I guess, nothing crazy. I try to understand everything before I push it up for review because if this breaks, it's still my name on it. I say why did you make this one change, it gives me a reasonable explanation for why. It says something codebaity like "if you want I can suggest 2 more ways you could really tighten up this work to prevent some rare but possible regressions". I'm smart enough to not fall for it. Code pushed up, task moved to in-review. I didn't write any of it, this is not my accomplishment. Users won't care who wrote it if it works. A lot done in 20 mins but it felt soulless.
English
82
56
1.5K
197K
Jon Yongfook
Jon Yongfook@yongfook·
@hustlin_heev Not an expert. But even in English something can be “fake” and “a fake”.
English
2
0
9
687
Jon Yongfook
Jon Yongfook@yongfook·
The idea of "one shotting" an app using AI is a fugazi. If you had to describe my app and all the edge cases I have solved over the years, it would be a prompt the size of a small book, and my app isn't even that complicated. The people promoting creating a business overnight with AI are just selling a get rich quick pipedream. Those grifters are present in every cycle. AI has completely transformed how I work, but you can't push a button and make money. Doesn't work like that.
Ronan Berder@hunvreus

Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit. Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe. But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope. I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit? Nope. Not buying it. PS: it may also be that I have an IQ of 82 and can't figure it out.

English
263
305
3.8K
403.5K
Brad Hefta-Gaub | Fractional CTO
This matches what I see in the fast-moving teams I work with and my own work. 20M tokens/day on Cursor, 3,000+ lines of code daily, same basic loop you describe. I do run multiple streams sometimes, but across different projects or clearly separate parts of the same codebase. I tried worktrees with parallel agents on related code and it collapsed the tight local-stack testing loop, which is the thing I won't give up. The performative complexity makes for good demos. Plan-build-test-refine is what actually ships.
English
0
0
3
719
Ronan Berder
Ronan Berder@hunvreus·
My use of AI is about as grug brained as it comes: - I talk to it for a while, arguing about architecture, priorities, scope, etc, to align on a plan. - Once we're on the same page, I let it build stuff. - I then test, ask more questions and refine. It usually runs on its own for 30 seconds to a few minutes. Sometimes 5 to 10 minutes if it's a big chunk of work. I've probably never had an agent running for more than 30 minutes. I don't use worktrees, subagents, ralph loops, ... It's mostly Codex, occasionally Pi. I've built a handful of skills and usually have a pretty short AGENTS.md. That's it. Now, I'd like to hear from the "AI experts" what I'm doing wrong.
English
65
1
215
36.8K
Brad Hefta-Gaub | Fractional CTO
In 35 years of shipping software, the development teams I've seen succeed most consistently share three capabilities: 1. Developers can run the full stack locally in their own environment. 2. Every pull request gets an ephemeral stack with the candidate changes, deployed automatically. 3. There's a staging environment where merged PRs integrate before production, catching issues that only appear when multiple changes collide. The thread through all three: developers moving fast need to test and debug their changes before they waste anyone else's time. Fast local feedback. Pre-merge validation in something production-like. Post-merge integration before the customer sees anything. Docker and Kubernetes made this radically easier than it was in the 90s. But the principle is older than either. The best teams I worked with twenty years ago had homegrown versions. The best teams today have polished versions. This matters more now, not less. If you're using AI to do the code editing, you're still the developer authoring it. You're still accountable. The tools that let you validate fast are the tools that let you own what ships. The cost of neglecting these practices was always real. AI made it catastrophic.
English
0
0
0
55
Brad Hefta-Gaub | Fractional CTO
This is the "taste" conversation with data. Plan, prompt precisely, verify every diff, own the code as if you'd typed it yourself. That's technical taste applied, and it's creating a K-shaped economy of AI coding: experienced devs at the top getting outsized leverage, everyone else at the bottom shipping more code than they can evaluate. Experienced devs use agents to extend their judgment. Vibe coders use agents to bypass it.
English
0
0
2
1.5K
Brad Hefta-Gaub | Fractional CTO
Working with several AI-assisted teams shipping 3,000+ lines of code per developer per day. The ones that scale and the ones that get stuck split on one thing: whether the pipeline was built for it. The old arc was: move fast, add CI when it hurts, add staging when it really hurts, skip PR previews. Cheap on day one, expensive on day ninety. At AI speed, day ninety arrives in week two. The teams winning have preview URLs on every PR, staging that mirrors production, and deliberate promotion gates. Boring infrastructure, disproportionate payoff.
English
0
0
0
54
Brad Hefta-Gaub | Fractional CTO
@aakashgupta This is what the taste conversation means in practice. Two weeks ago I watched Claude Code scaffold a framework and pass API keys through the LLM in cleartext: x.com/HowManyCTOsPod… Engineers with technical taste catch this. Engineers without it ship secrets to attackers.
How Many CTOs Podcast@HowManyCTOsPod

Can we trust our sensitive data to remain secure when using #AIAssistedCoding? On today's #HowManyCTOs episode, Brad shares a recent experience with #Claude that could have gone very wrong. #Innovation #TechDebate #TechTalk #TechPodcast #TechLeadership #AI #Cybersecurity #LLMs

English
0
0
0
301
Aakash Gupta
Aakash Gupta@aakashgupta·
🚨 Do you understand what’s happening?! > Lovable just got valued at $6.6 billion and a guy with a free account read another user's source code, database credentials, and AI chat history in 5 API calls.. Customers include Uber, Zendesk, Nvidia, Microsoft, Spotify. > The Lovable researcher reported the bug 48 days ago through HackerOne. They marked it a "duplicate" and closed it. It still worked the day it leaked… > Vercel got breached this week through a third-party AI tool an employee installed.. ShinyHunters listed the stolen API keys and source code on BreachForums for $2 million. > Lovable's response was "we did not suffer a data breach." Then they admitted they patched the API in November and left every project from before that date exposed.. > When you vibe code an app you paste your Stripe key into the chat. You paste your database URL into the chat. You paste customer records into the chat. The endpoint that leaked.. returned chat histories. > Anthropic built a model that scored 83.1% on finding real software vulnerabilities. Found a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg. They named it Mythos and locked it behind a 50-company firewall because public release was too dangerous.. > Two of the world's biggest dev infrastructure platforms got breached in 48 hours and a $6.6B company's first instinct was to argue about the definition of the word "public..." > Every founder who shipped an MVP on Lovable in early 2025 woke up today to find their database credentials are public records.. > The trust boundaries in the AI dev stack are drawn with marker. And it's raining. If your stack ever touched a vibe coding platform, rotate everything tonight. Check the chat logs for what you forgot you typed. AI is here. And we’re f*cked.
English
30
32
323
101.7K
Brad Hefta-Gaub | Fractional CTO
@GergelyOrosz Different picture from my corner: fractional CTO across several small startups, none concerned about token spend. Mix of greenfield and existing projects that pre-date the good tools and adopted them mid-flight. All seeing big leverage. Happy to DM specifics.
English
0
0
0
74
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Hearing stories from inside several tech companies that token spend is MUCH higher than forecasted, and 📈 If you're in this situation, what is your strategy, or your team's / company's strategy? Send a DM and I'll share what I've collected so far.
English
64
29
540
72.1K
Brad Hefta-Gaub | Fractional CTO
Everyone says AI makes taste the only moat now. Taste is four things: aesthetic, product, technical, strategic. Most founders have one or two. A few have three. Almost nobody has all four. The question isn't whether you have taste. It's which kinds, and how to cover the gaps.
English
0
0
0
36
andrew chen
andrew chen@andrewchen·
hot take :) The biggest and most productive people in the AI era are the folks who are already good at their jobs. AI as a multiplier, not an equalizer/democratizer
English
332
589
6.1K
327.2K
Brad Hefta-Gaub | Fractional CTO
The best engineers I'm working with have stopped writing code. Not mostly stopped. Stopped. They direct agents, review output, and make the architectural calls.
English
0
0
0
51