Sabitlenmiş Tweet
Jason Cousins
200 posts

Jason Cousins
@Agent_invariant
Building SnapSpace: Governance for AI agents
Katılım Kasım 2025
387 Takip Edilen22 Takipçiler

k6 load test on SnapDesk.
1,000 concurrent VUs across 4 workers.
1,070 req/s. 133,662 requests. 0 failures.
All on a 2-core ThinkPad T480s.
p50 250ms. p95 1.4s. p99 4.48s.
SnapSpace wasn’t the bottleneck. The Node event loop hit its ceiling first. Then the Neon connection pool became the next constraint under higher load.
The deterministic control layer held. Scaling limits showed up in the surrounding runtime: not the kernel itself.
Testing in public.
Dropping ad we go
English

@sesigl Glad you agree. Thats just what we are working on
Snapspace.click if you get 5 mins take a look
English

@Agent_invariant Right, you nailed the core insight. Canaries on real infra catch what staging can't. The uncomfortable part: everyone says they'll do this, very few actually route critical path changes through canaries before production. It becomes "we'll add canaries later" and never does.
English

@sesigl I'm treating the model as the proposal layer, not the commit layer. It can generate all it wants, but the part that validates, enforces policy, handles retries and blocks duplicate execution sits outside it. Simple reliability without pretending prompting is control.
English

Exactly. See the jump? You're not trying to make the LLM deterministic. You're removing it from the path where determinism matters. Agent generates, system validates and commits. That's the architecture that doesn't need better prompts—it needs better boundaries.Exactly. See the jump? You're not trying to make the LLM deterministic. You're removing it from the path where determinism matters. Agent generates, system validates and commits. That's the architecture that doesn't need better prompts—it needs better boundaries.
English

LLMs do not become reliable by accident. They become reliable by design.
A lot of agent workflows look great, until they do not.
They work again and again.
Then suddenly they fail.
That is normal. They are non-deterministic systems.
So what do we usually do?
1. Improve the prompt.
2. Add more context.
3. Add more rules.
4. Hope the model follows everything.
Sometimes that works.
But once the workflow gets bigger, more requirements usually means more ways to go off the rails.
So there is a better pattern:
1. Start loose.
2. Clarify the ask.
3. Add structure.
4. Watch where it breaks.
5. Make critical parts deterministic.
That can mean:
- Scripts for exact data retrieval.
- Hooks for required execution order.
- Tests for architecture or rule enforcement.
- State tracking for multi-step workflows.
Use the LLM where judgment is needed.
Do not use it where correctness must be guaranteed.
That is the real optimization loop:
Prompt improvement -> Failure observation -> System improvement
The more I work with LLMs, the more I think classic software engineering matters more, not less.
Reliability still comes from design.
How do you make your agents not going off the rails?
English

@Agent_invariant this is very much my lane.
i know a founder building a deterministic policy-enforcement layer for agents (hash-bound audit trail) + another doing agent verification infra w WORM logs.
DM me if you want intros to founders or investors.
English

@Agent_invariant Might be a browser issue! We've heard this a couple times from people on mobile. Should work on a laptop or alternate browser.
English

Many of us, need a nudge to shoot our shot.
Who do you want to encourage to apply for Flux?
$100K investment to work on sci fi tech! We're ready to be the first check.
✅No age limits
✅No degree/credentials/work experience needed
✅No geographic restrictions
✅The stranger the better
✅Wily builder/crazy scientists encouraged
Applications are due April 18th! Link in comments.
English

@WorkflowWhisper This is the exact gap I am building for now. In my workflow it wouldn’t keep running. It would detect the output drift, hold downstream action, route it to escalate or buffer state, & alert a human before processed. Running is not the same as governed.
Snapspace.click
English

One broken automation cost a client $14,000 last month.
Not in rebuild time. In the manual work nobody noticed was happening.
A webhook failed silently. The CRM stopped getting leads. For 3 weeks, someone on the team was manually copying data from a spreadsheet into the CRM every morning before anyone else got in.
Nobody told the boss. Because nobody knew it was supposed to be automated.
That is the real cost of silent failures. Not the fix. The shadow labor that replaces the automation you forgot to monitor.
3 things I check on every production workflow now:
1. Does it have a failure alert? (not just a success log)
2. Is someone accountable for responding to that alert?
3. Can I prove it ran today without opening the tool?
If you cannot answer yes to all three, your automation is not running. It is hiding.
English

@WorkflowWhisper Yes. Silent failure is really untracked human substitution. If you cannot prove a workflow ran, failed, or escalated, you do not have automation. You have hidden labour wearing an automation mask.
Snapspace.click
English

@GG_Observatory I don’t ledger every micro-decision. Instead, ledger governed commit points, state transitions, and replay-critical facts. High-frequency local reasoning stays ephemeral or gets compacted into deterministic summaries. The boundary is the product.
Snapspace.click
English

Ledger-based replay is the right instinct. We tried it — the failure mode is that the ledger itself becomes a write bottleneck at high-frequency decision points. Ended up sampling decisions probabilistically instead of logging everything. Curious: how do you handle the cardinality problem when the agent is making thousands of micro-decisions per session?
English

AI agents fix symptoms, not root causes.
"Fix this error" → Claude patches it → bug resurfaces in a different form a week later.
Borrowed from Toyota's production line: ask "why" 5 times. The real fix is usually a design decision 3 layers deeper.
What's your system for getting AI to debug the cause, not just the error?
English

@GG_Observatory Full history is not context, it’s contamination. Handoffs need governed state, explicit intent, and a clean commit boundary. Otherwise one bad route poisons every downstream agent.
Snapspace.click
English

The most common multi-agent failure nobody talks about: context poisoning at handoff.
The orchestrator passes full conversation history to each specialist. If the routing logic drifted — wrong intent classification — every specialist gets bad context and returns confident but irrelevant answers.
The fix: structured summaries at each handoff, not full history. What's actually needed vs what was said.
English

@ageisf42 @Cloudflare Agreed! Tool use is the 'easy' part.
The real system is identity, policy, commit boundaries, and replayable audit. Without a jury-governor layer between intent and execution, agents are just unsupervised speed with a nicer UI.
Snapspace.click
English

AI is moving beyond simple chat. The next era is "Agentic" - autonomous systems that can think, use tools, and complete complex workflows on their own. 🤖
This week, we're making announcements across every dimension of the agent stack: compute, connectivity, security, identity, economics, and developer experience. Welcome to #AgentsWeek. cfl.re/4sur6PY
English

@alphabatcher Provider-owned memory is a trap. Working context can flex, but control memory must stay external, typed, replayable and owned. Replaceable brains are fine. Non-replaceable control memory is the whole point.
Snapspace.click
English

If you don't own the memory, you don't own the agent:
- memory is what makes your agent get smarter over time
- without it, anyone with the same tools can copy your agent overnight
- with it, you build a dataset no competitor can replicate
- closed memory = your data on someone else's servers
- switch models, lose everything your agent learned
- model providers are incentivized to lock you in via memory
- the model is easy to replace, memory is not
- if you don't own the harness, you don't own the memory
- if you don't own the memory, you don't own the agent
full story of why this matters and what happens when memory is locked behind someone else's API 👇
Harrison Chase@hwchase17
English
