Stephen Brouhard

3.2K posts

Stephen Brouhard

@ssbrouhard

Building practical AI agent tools. Code comprehension, quality gates, verification over slop. Shipping reliable systems that hold up in production.

Jacksonville, FL, USA Beigetreten Nisan 2019

250 Folgt380 Follower

Angehefteter Tweet

Stephen Brouhard@ssbrouhard·4d

Big fan of Orca and the newly dropped Firstmate, so I built the bridge. Credit to @kunchenguid for Firstmate. Its orchestration protocol is awesome! Swapped the default tmux runtime for @orca_build + Codex. Full scout/ship lifecycles work end to end. Protocol stays the same, just pick your runtime. Full implementation details in article 👇 Firstmate github.com/kunchenguid/fi…

Stephen Brouhard@ssbrouhard

x.com/i/article/2067…

English

4.4K

Morgan@morganlinton·1h

This idea that the agentic coding workflow should be three models, an orchestrator, executor, and code reviewer, is pretty broken imo. The problem is, going multi-model isn't that revolutionary, if you're using only one agent at the execution layer. Most complex problems aren't uniformly complex, so using one model at the execution layer means you're often using far more thinking depth, and tokens than you need. I've been experimenting with a new agentic coding workflow, and I think the key aha moment I had is realizing that one model at the execution layer, will pretty much always mean, using way more tokens than you have to, to get the same result.

Morgan@morganlinton

x.com/i/article/2069…

English

797

Stephen Brouhard@ssbrouhard·11m

@morganlinton Totally agree. Route simple stuff to fast/light models and only hit the frontier ones where it actually needs it. Use the horsepower where its needed. Great post.

English

Stephen Brouhard@ssbrouhard·8h

@beffjezos Game changer. Only Grok Build model though has the access

English

Beff (e/acc)@beffjezos·11h

Unironically Grok Build is great for this

janak@janaksunil

i cannot wait to ask codex to browse twitter during the workday for me

English

9.9K

Stephen Brouhard@ssbrouhard·9h

If you didn't already, copy and paste the article into codex. Its not a full tutorial but it will understand and be able to implement in orca if your set up is still shaky. I made this doc for myself for reference and saved to my local repo. May be helpful: Firstmate Repo/Layout Mental Model This setup has two layers: ## 1. Operational cockpit `` This is the directory you open/run when using Firstmate. It owns local/private fleet state: - `config/` - local runtime choices, like the active backend profile - `data/` - private memory, backlog, briefs, reports, notes - `projects/` - project clones managed by Firstmate - `state/` - volatile crewmate/session records These are local/private and should not be pushed to a shared repo. ## 2. Tracked source clone `` This is the reusable Firstmate source/template repo. It owns tracked/shared material: - `AGENTS.md` - `README.md` - `CONTRIBUTING.md` - `bin/` - `.agents/skills/` The cockpit symlinks shared files/directories into itself, so editing `bin/...`, `AGENTS.md`, or `.agents/skills/...` from the cockpit actually edits the tracked source clone. ## Mental model ```text = operational cockpit/private fleet state = reusable Firstmate source/template /projects = project clones managed by Firstmate ## Fork/upstream safety If you do not own the upstream Firstmate repo, keep remotes shaped like this: origin -> upstream -> upstream push URL -> DISABLED Do not push branches or open PRs against upstream unless you explicitly intend to contribute upstream. For personal/local Firstmate tool changes, keep work local by default. If a GitHub target is needed, push to your fork, not upstream. ## Backend switching Backend selection is local cockpit state: /config/ backend.env Switch commands: bin/fm-backend-current bin/fm-backend-use orca bin/fm-backend-use codex-app Switching affects future crewmates only. Existing crewmates keep their backend recorded in state/.meta.

Stephen Brouhard@ssbrouhard

x.com/i/article/2067…

English

John Curtis@jcurtis·13h

Glad to hear you say it… I actually thought that was the intent, I had to clone the repo and ask GPT to double check my understanding on the first project. Once I got my head wrapped around it I can see the autonomy and AFK benefits. Also wanna mention I appreciate the rigidity around firstmate(agent) not making code changes. I think it’s the right model for the orchestration delegate.

English

Stephen Brouhard@ssbrouhard·15h

Codex App/Firstmate backend issues and roadmap ⤵️ The backend is viable: a Codex App crewmate can be spawned as a visible thread, complete a no-mistakes ship task, open a PR, pass CI, merge, archive, and tear down. Remaining work is mostly deterministic lifecycle polish. - FM_BACKEND=codex-app means visible Codex Desktop threads, not headless codex app-server sessions. - Shell helpers own local Firstmate state: briefs, metadata, PR polling, and teardown safety checks. - Codex Desktop owns thread actions: create, fork, send, read, title, pin, archive, and handoff. - Orca and tmux remain separate backends. Codex App work must not regress them. Images attached with more info from doc Codex created for tracking. Integrating the recent upstream commits so this may adjust as i keep working through it:

Stephen Brouhard@ssbrouhard

For folks who like the Codex desktop app but are curious about Firstmate by @kunchenguid: I’ve been testing them together. Orca already works. Codex App integration is now getting real: visible threads, worktrees, supervised handoffs, PR flow. Not fully polished yet, but very interesting. github.com/kunchenguid/fi…

English

1.8K

Stephen Brouhard@ssbrouhard·11h

@LLMJunky @RichDoesTech masterful use of spacing in your reply 🤣

English

am.will@LLMJunky·11h

@RichDoesTech i use it. and i hate it

English

256

R.@RichDoesTech·16h

Love anththropic but 65% of their code written in this is the reason why the desktop app continues to be so buggy/unloved, nobody is using it. 😭 - Image cards still non clickable - Right click on paths and it closes when you hover over “open in” - No simple commit + auto generate message - Still branching off at random even though you're not trying to tree trunk - No tabs within a single chat thread (multiple convos about aa single feature like conductor) - sidebar state still gets broken for chats (shows three shimmering dots when the convo is already done) - customer support when facing an issue is non existent, and even the chatbots gaslight you saying you can do things that you clearly can't - There's now exponential backoffs when requests fail, but sometimes it still instant fails without it. - manual relaunch to update, no ability to update when chats are inactive / on initial load. - set effort to ultracode but it keeps reverting between chats, no where to set defaults and my request doesn't persist. - can't reference chats by id or with @ - etc.

Claude@claudeai

Introducing Claude Tag, a new way for teams to work with Claude. In Slack, Claude joins as a team member with access to the channels and tools you choose. Tag Claude in and delegate tasks to it while you focus on other work.

English

1.4K

Stephen Brouhard@ssbrouhard·11h

Grok build model specifically has the native x_search tool from the terminal session which is awesome! grok-composer-2.5-fast doesnt have access to the tooling.

0xSero@0xSero

You can browse and fetch data from X in Grok build.

English

112

Stephen Brouhard@ssbrouhard·11h

@LLMJunky @0xSero

QME

am.will@LLMJunky·11h

@0xSero okay that' actually huge

English

528

0xSero@0xSero·15h

You can browse and fetch data from X in Grok build.

English

Stephen Brouhard@ssbrouhard·11h

@LLMJunky @0xSero it seems to be only with Grok build. couple hrs ago i asked it if it could when set to grok composer and it said now. Just swapped it to build and it has access

English

Stephen Brouhard@ssbrouhard·11h

@kunchenguid good pov from someone who's been inside the big company machines. easy to get caught up in the good side bad side of a situation like this when reading it and gravitating towards siding with the "underdog" without the full story

English

Stephen Brouhard@ssbrouhard·13h

@eliana_jordan @mariyav4leva I was just thinking this lol i think his last one is 3 million plus views.

English

Eliana@eliana_jordan·17h

@mariyav4leva No i will check!!

English

Eliana@eliana_jordan·1d

everyone says articles perform well on x. mine are often my worst performing posts. and i actually put effort into them. so now i’m wondering: am i writing things nobody cares about… or am i just posting them wrong? roast me

Eliana@eliana_jordan

x.com/i/article/2064…

English

6.2K

Stephen Brouhard@ssbrouhard·13h

@morganlinton Great idea!

English

Morgan@morganlinton·15h

Fun use case for Grok Build. Point it at a folder full of screenshots, and it will review the image in the screenshot and give it a logical name. 100% success rate.

English

2.8K

Stephen Brouhard@ssbrouhard·13h

@steipete gogcli.sh has better agent ergonomics. Less but better. Every agent I have had analyze them chooses gog so I continue to use it

English

14.6K

Peter Steinberger 🦞@steipete·14h

Google fired the guy that made the google workspace cli, because he made the google workspace cli. Lucky me, Google can't fire me. gogcli.sh

Justin Poehnelt@JPoehnelt

Two months ago I was fired by Google for creating the Google Workspace CLI. It went viral, hit #1 on Hacker News, gained thousands of GitHub stars and many thousands of actual users in just a couple days. It was an incredible, confusing journey, from directors and leaders asking what they could learn from the tool to getting grilled by legal about why the Google logo and brand colors are on the Google Workspace GitHub code repositories. I think the cause was that Workspace and certain leaders (and projects) were afraid of being disrupted. But the fear wasn't specific to my CLI, it was a broader fear in what agents meant for Workspace. Either way, the irony of my termination was the announcement at Google Cloud Next two days before I was fired that an official Workspace CLI was coming. I want this out there because it is easier for me to explain my story and it is an experience I want to fully own. It's also part of my healing. Nearly 7 years at Google was an incredible opportunity for me and I was fortunate to have wonderful teammates and a manager that fully supported me through these last few months. Thank you.

English

135

319

6.4K

968K

Stephen Brouhard@ssbrouhard·14h

Appreciate the kind words. That is a good idea for a video. The best lane to stay in is the Firstmate thread unless you intentionally want to inspect or intervene. The left sidebar can show Orca worktrees, Codex threads, background agents, etc, but those are implementation surfaces. Some stay and some disappear (archived) when worktrees are torn down. Firstmate is meant to be the control plane or like a router: you give it the request, it decides where work lands, watches the state, and brings back only the decisions and results you need. Instead of us managing the projects and threads, firstmate does. Which is the opposite of how we have all been operating so its a change for sure. It is in the same plane as all the talk from the openAI team about having a single orchestator thread recently but more opinionated and ships with additional tools like gh axi, no mistakes etc. I keep folding my projects into Firstmate little by little, goal being all of them live there in the projects section eventually. Then i just have the single thread that talk to. So its good to know how it works but not necessary.

English

John Curtis@jcurtis·14h

Thansk for the roadmap and keep up the good work! Started trying it out and/in Orca. I’m still working on learning the flow but I like where it’s headed. I’m still trying to wrap my head around the ocra default gittree behavior along with with codex background subagent (also git tree) and trying to figure out which tier some of this should be delegated… it’s a long way of saying. I’d love to see an over your shoulder video of you just using this stack.

English

Stephen Brouhard@ssbrouhard·14h

@francedot @LLMJunky @trycua its a shame, cua is 100x better than what they have, still in beta. watching it navigate a UI gives you that exact, painful frustration you get when youre trying to guide another person through a desktop task and they just arent getting it.

English

Francesco@francedot·15h

@ssbrouhard @LLMJunky @trycua we’ve had an issue open for a while and tried reaching out through multiple channels including through my own yc network but sadly they’ve chosen to ignore us: github.com/anthropics/cla…

English

am.will@LLMJunky·15h

Computer and Browser use are amazing, life changing innovations. But they have seriously glaring problems that prevent them from truly being exceptional. 1. Though they claim background use, many tasks will fail unless the agent has focus. Thus, you often cannot work at the same time. A workaround for this is to give the agent its own focused browser window, while you work in another. But, this only works for Chrome control. For computer use that steals mouse focus, you're cooked. You'll just fight the agent back and forth for mouse control and it won't get any work done 2. And this is a big one. It is P A I N F U L L Y slow. For quick tasks, no big deal, but for complex automations, it can drag out for literally hours. For example, I can easily complete the pictured task in 5-7 minutes with a mouse and keyboard. For Claude, it takes well over 90 minutes. Codex is no faster, even on Fast mode. Codex is better at computer use IMO, but the 262K context window is actually a handicap when you have a long automation. Computer use is EXTREMELY token hungry, especially on high resolution monitors. With a smaller context window, there's simply too many compactions, and context drift is really impactful with CU. Computer use works in a loop. It inspects the AX tree / screenshot, determines state, takes some action, and then checks the state again by AX/screenshot - over, and over, and over, and over again. Each one of these events is a new api call, adding a great deal of latency for every loop it requires. Every single tiny little action is another loop. Thus, the more actions an automation requires, the time increases for the task exponentially. Whomever solves these two main challenges will build something really special. Automation finished 1hr48m to edit 16 text blocks

English

2.1K

Stephen Brouhard@ssbrouhard·15h

@kunchenguid yes, I’ll watch for that to land. this may need to be reshaped around the abstraction but I think most of it should carry over.

English

Kun Chen@kunchenguid·15h

@ssbrouhard i plan to work in backend abstraction later this week. do you think these changes can be pushed upstream once that’s in place?

English

608

Stephen Brouhard@ssbrouhard·15h

@LLMJunky @trycua Claude is the most painful to use lol

English

Stephen Brouhard@ssbrouhard·15h

@LLMJunky Good breakdown, agree on all of it. Codex computer use has been broken on my machine for weeks unfortunately. I installed @trycua I havent had a need to use it lately but I believe it addresses the hijacking part. I dont think is solves the remaining concerns yet.

English

105

Stephen Brouhard@ssbrouhard·16h

@kunchenguid Def lives up to the slogan "Kill all the slop. Raise clean PR." Its very thorough.

English

Kun Chen@kunchenguid·16h

@ssbrouhard great to see! likewise it’s caught so many problems that would have gone into my repos

English

Stephen Brouhard@ssbrouhard·16h

no-mistakes by @kunchenguid just blocked a merge over a privacy leak I never would've caught. New feature would've sent private package names to a public API on an opt in flag. Looked fine in review. A "paranoid" gate is exactly what you want reviewing your code. Repo 👇

English

809

Entdecken

@morganlinton @beffjezos @LLMJunky @RichDoesTech @0xSero @kunchenguid @eliana_jordan @mariyav4leva