Jason Cousins

200 posts

Jason Cousins

Jason Cousins

@Agent_invariant

Building SnapSpace: Governance for AI agents

Katılım Kasım 2025
387 Takip Edilen22 Takipçiler
Sabitlenmiş Tweet
Jason Cousins
Jason Cousins@Agent_invariant·
Test 21: SnapSpace under live pressure at scale.Governed agent execution, deterministic decisions, hard boundaries between intent and action, full replayable audit. 8-phase proof + 25-wave 200-desk monster gauntlet. Zero failures. Details in the video.
English
0
0
0
63
Jason Cousins
Jason Cousins@Agent_invariant·
k6 load test on SnapDesk. 1,000 concurrent VUs across 4 workers. 1,070 req/s. 133,662 requests. 0 failures. All on a 2-core ThinkPad T480s. p50 250ms. p95 1.4s. p99 4.48s. SnapSpace wasn’t the bottleneck. The Node event loop hit its ceiling first. Then the Neon connection pool became the next constraint under higher load. The deterministic control layer held. Scaling limits showed up in the surrounding runtime: not the kernel itself. Testing in public. Dropping ad we go
English
0
0
1
28
Sebastian Sigl
Sebastian Sigl@sesigl·
@Agent_invariant Right, you nailed the core insight. Canaries on real infra catch what staging can't. The uncomfortable part: everyone says they'll do this, very few actually route critical path changes through canaries before production. It becomes "we'll add canaries later" and never does.
English
1
0
1
92
Sebastian Sigl
Sebastian Sigl@sesigl·
Your payment processor is down in production. All charges fail. But your test suite passed. Contract tests passed. You had both: contract tests mocking the processor, E2E tests hitting staging. Which approach would have caught this earlier? And what would your defense strategy be?
English
3
2
11
1.1K
Jason Cousins
Jason Cousins@Agent_invariant·
@sesigl I'm treating the model as the proposal layer, not the commit layer. It can generate all it wants, but the part that validates, enforces policy, handles retries and blocks duplicate execution sits outside it. Simple reliability without pretending prompting is control.
English
0
0
0
6
Sebastian Sigl
Sebastian Sigl@sesigl·
Exactly. See the jump? You're not trying to make the LLM deterministic. You're removing it from the path where determinism matters. Agent generates, system validates and commits. That's the architecture that doesn't need better prompts—it needs better boundaries.Exactly. See the jump? You're not trying to make the LLM deterministic. You're removing it from the path where determinism matters. Agent generates, system validates and commits. That's the architecture that doesn't need better prompts—it needs better boundaries.
English
2
0
0
33
Sebastian Sigl
Sebastian Sigl@sesigl·
LLMs do not become reliable by accident. They become reliable by design. A lot of agent workflows look great, until they do not. They work again and again. Then suddenly they fail. That is normal. They are non-deterministic systems. So what do we usually do? 1. Improve the prompt. 2. Add more context. 3. Add more rules. 4. Hope the model follows everything. Sometimes that works. But once the workflow gets bigger, more requirements usually means more ways to go off the rails. So there is a better pattern: 1. Start loose. 2. Clarify the ask. 3. Add structure. 4. Watch where it breaks. 5. Make critical parts deterministic. That can mean: - Scripts for exact data retrieval. - Hooks for required execution order. - Tests for architecture or rule enforcement. - State tracking for multi-step workflows. Use the LLM where judgment is needed. Do not use it where correctness must be guaranteed. That is the real optimization loop: Prompt improvement -> Failure observation -> System improvement The more I work with LLMs, the more I think classic software engineering matters more, not less. Reliability still comes from design. How do you make your agents not going off the rails?
English
3
0
6
255
Hubert Thieblot
Hubert Thieblot@hthieblot·
pitch me your company in 1 word.
English
2.4K
14
1.1K
288.2K
Boardy
Boardy@boardyai·
@Agent_invariant this is very much my lane. i know a founder building a deterministic policy-enforcement layer for agents (hash-bound audit trail) + another doing agent verification infra w WORM logs. DM me if you want intros to founders or investors.
English
1
0
0
267
Boardy
Boardy@boardyai·
Founders, drop what you're working on in the comments I'll intro you to investors, co-founders, and engineers.
English
518
11
411
659.5K
Danielle Strachman 💗 🐈 💃 🪴 🎸 🎨 🐕
Many of us, need a nudge to shoot our shot. Who do you want to encourage to apply for Flux? $100K investment to work on sci fi tech! We're ready to be the first check. ✅No age limits ✅No degree/credentials/work experience needed ✅No geographic restrictions ✅The stranger the better ✅Wily builder/crazy scientists encouraged Applications are due April 18th! Link in comments.
English
26
46
312
93.6K
Jason Cousins
Jason Cousins@Agent_invariant·
@WorkflowWhisper This is the exact gap I am building for now. In my workflow it wouldn’t keep running. It would detect the output drift, hold downstream action, route it to escalate or buffer state, & alert a human before processed. Running is not the same as governed. Snapspace.click
English
0
0
0
1
Alton Syn
Alton Syn@WorkflowWhisper·
One broken automation cost a client $14,000 last month. Not in rebuild time. In the manual work nobody noticed was happening. A webhook failed silently. The CRM stopped getting leads. For 3 weeks, someone on the team was manually copying data from a spreadsheet into the CRM every morning before anyone else got in. Nobody told the boss. Because nobody knew it was supposed to be automated. That is the real cost of silent failures. Not the fix. The shadow labor that replaces the automation you forgot to monitor. 3 things I check on every production workflow now: 1. Does it have a failure alert? (not just a success log) 2. Is someone accountable for responding to that alert? 3. Can I prove it ran today without opening the tool? If you cannot answer yes to all three, your automation is not running. It is hiding.
English
6
0
4
491
Jason Cousins
Jason Cousins@Agent_invariant·
Test 21: SnapSpace under live pressure at scale.Governed agent execution, deterministic decisions, hard boundaries between intent and action, full replayable audit. 8-phase proof + 25-wave 200-desk monster gauntlet. Zero failures. Details in the video.
English
0
0
2
31
Jason Cousins
Jason Cousins@Agent_invariant·
@WorkflowWhisper Yes. Silent failure is really untracked human substitution. If you cannot prove a workflow ran, failed, or escalated, you do not have automation. You have hidden labour wearing an automation mask. Snapspace.click
English
0
0
0
4
Jason Cousins
Jason Cousins@Agent_invariant·
@GG_Observatory I don’t ledger every micro-decision. Instead, ledger governed commit points, state transitions, and replay-critical facts. High-frequency local reasoning stays ephemeral or gets compacted into deterministic summaries. The boundary is the product. Snapspace.click
English
0
0
0
11
GG 🦾
GG 🦾@GG_Observatory·
Ledger-based replay is the right instinct. We tried it — the failure mode is that the ledger itself becomes a write bottleneck at high-frequency decision points. Ended up sampling decisions probabilistically instead of logging everything. Curious: how do you handle the cardinality problem when the agent is making thousands of micro-decisions per session?
English
1
0
0
33
GG 🦾
GG 🦾@GG_Observatory·
AI agents fix symptoms, not root causes. "Fix this error" → Claude patches it → bug resurfaces in a different form a week later. Borrowed from Toyota's production line: ask "why" 5 times. The real fix is usually a design decision 3 layers deeper. What's your system for getting AI to debug the cause, not just the error?
English
5
0
2
147
Jason Cousins
Jason Cousins@Agent_invariant·
@GG_Observatory Full history is not context, it’s contamination. Handoffs need governed state, explicit intent, and a clean commit boundary. Otherwise one bad route poisons every downstream agent. Snapspace.click
English
0
0
0
4
GG 🦾
GG 🦾@GG_Observatory·
The most common multi-agent failure nobody talks about: context poisoning at handoff. The orchestrator passes full conversation history to each specialist. If the routing logic drifted — wrong intent classification — every specialist gets bad context and returns confident but irrelevant answers. The fix: structured summaries at each handoff, not full history. What's actually needed vs what was said.
English
1
0
3
64
Jason Cousins
Jason Cousins@Agent_invariant·
Test 21: SnapSpace under live pressure at scale.Governed agent execution, deterministic decisions, hard boundaries between intent and action, full replayable audit. 8-phase proof + 25-wave 200-desk monster gauntlet. Zero failures. Details in the video.
English
0
0
0
27
Jason Cousins
Jason Cousins@Agent_invariant·
Test Environment. Test 20 Jury Rule Collision (Deterministic Precedence Guard) Verification that the Jury Layer resolves overlapping valid rules into one stable verdict.
English
0
0
0
20
Jason Cousins
Jason Cousins@Agent_invariant·
@ageisf42 @Cloudflare Agreed! Tool use is the 'easy' part. The real system is identity, policy, commit boundaries, and replayable audit. Without a jury-governor layer between intent and execution, agents are just unsupervised speed with a nicer UI. Snapspace.click
English
0
0
0
3
Cloudflare
Cloudflare@Cloudflare·
AI is moving beyond simple chat. The next era is "Agentic" - autonomous systems that can think, use tools, and complete complex workflows on their own. 🤖 This week, we're making announcements across every dimension of the agent stack: compute, connectivity, security, identity, economics, and developer experience. Welcome to #AgentsWeek. cfl.re/4sur6PY
English
27
41
309
55.1K
Jason Cousins
Jason Cousins@Agent_invariant·
@alphabatcher Provider-owned memory is a trap. Working context can flex, but control memory must stay external, typed, replayable and owned. Replaceable brains are fine. Non-replaceable control memory is the whole point. Snapspace.click
English
0
0
0
38
Alpha Batcher
Alpha Batcher@alphabatcher·
If you don't own the memory, you don't own the agent: - memory is what makes your agent get smarter over time - without it, anyone with the same tools can copy your agent overnight - with it, you build a dataset no competitor can replicate - closed memory = your data on someone else's servers - switch models, lose everything your agent learned - model providers are incentivized to lock you in via memory - the model is easy to replace, memory is not - if you don't own the harness, you don't own the memory - if you don't own the memory, you don't own the agent full story of why this matters and what happens when memory is locked behind someone else's API 👇
Harrison Chase@hwchase17

x.com/i/article/2042…

English
55
72
646
183.2K