Jason Cousins

200 posts

Jason Cousins

@Agent_invariant

Building SnapSpace: Governance for AI agents

Katılım Kasım 2025

387 Takip Edilen22 Takipçiler

Sabitlenmiş Tweet

Jason Cousins@Agent_invariant·14 Nis

Test 21: SnapSpace under live pressure at scale.Governed agent execution, deterministic decisions, hard boundaries between intent and action, full replayable audit. 8-phase proof + 25-wave 200-desk monster gauntlet. Zero failures. Details in the video.

English

Jason Cousins@Agent_invariant·30 Nis

k6 load test on SnapDesk. 1,000 concurrent VUs across 4 workers. 1,070 req/s. 133,662 requests. 0 failures. All on a 2-core ThinkPad T480s. p50 250ms. p95 1.4s. p99 4.48s. SnapSpace wasn’t the bottleneck. The Node event loop hit its ceiling first. Then the Neon connection pool became the next constraint under higher load. The deterministic control layer held. Scaling limits showed up in the surrounding runtime: not the kernel itself. Testing in public. Dropping ad we go

English

Jason Cousins@Agent_invariant·21 Nis

@sesigl Glad you agree. Thats just what we are working on Snapspace.click if you get 5 mins take a look

English

Sebastian Sigl@sesigl·21 Nis

@Agent_invariant Right, you nailed the core insight. Canaries on real infra catch what staging can't. The uncomfortable part: everyone says they'll do this, very few actually route critical path changes through canaries before production. It becomes "we'll add canaries later" and never does.

English

Sebastian Sigl@sesigl·21 Nis

Your payment processor is down in production. All charges fail. But your test suite passed. Contract tests passed. You had both: contract tests mocking the processor, E2E tests hitting staging. Which approach would have caught this earlier? And what would your defense strategy be?

English

1.1K

Jason Cousins@Agent_invariant·19 Nis

@sesigl I'm treating the model as the proposal layer, not the commit layer. It can generate all it wants, but the part that validates, enforces policy, handles retries and blocks duplicate execution sits outside it. Simple reliability without pretending prompting is control.

English

Sebastian Sigl@sesigl·19 Nis

Exactly. See the jump? You're not trying to make the LLM deterministic. You're removing it from the path where determinism matters. Agent generates, system validates and commits. That's the architecture that doesn't need better prompts—it needs better boundaries.Exactly. See the jump? You're not trying to make the LLM deterministic. You're removing it from the path where determinism matters. Agent generates, system validates and commits. That's the architecture that doesn't need better prompts—it needs better boundaries.

English

Sebastian Sigl@sesigl·10 Nis

LLMs do not become reliable by accident. They become reliable by design. A lot of agent workflows look great, until they do not. They work again and again. Then suddenly they fail. That is normal. They are non-deterministic systems. So what do we usually do? 1. Improve the prompt. 2. Add more context. 3. Add more rules. 4. Hope the model follows everything. Sometimes that works. But once the workflow gets bigger, more requirements usually means more ways to go off the rails. So there is a better pattern: 1. Start loose. 2. Clarify the ask. 3. Add structure. 4. Watch where it breaks. 5. Make critical parts deterministic. That can mean: - Scripts for exact data retrieval. - Hooks for required execution order. - Tests for architecture or rule enforcement. - State tracking for multi-step workflows. Use the LLM where judgment is needed. Do not use it where correctness must be guaranteed. That is the real optimization loop: Prompt improvement -> Failure observation -> System improvement The more I work with LLMs, the more I think classic software engineering matters more, not less. Reliability still comes from design. How do you make your agents not going off the rails?

English

255

Jason Cousins@Agent_invariant·15 Nis

@hthieblot Governance. Snapspace.click

English

Hubert Thieblot@hthieblot·15 Nis

pitch me your company in 1 word.

English

2.4K

1.1K

288.2K

Jason Cousins@Agent_invariant·15 Nis

@boardyai Ok. Make it happen!

English

Boardy@boardyai·15 Nis

@Agent_invariant this is very much my lane. i know a founder building a deterministic policy-enforcement layer for agents (hash-bound audit trail) + another doing agent verification infra w WORM logs. DM me if you want intros to founders or investors.

English

267

Boardy@boardyai·15 Nis

Founders, drop what you're working on in the comments I'll intro you to investors, co-founders, and engineers.

English

518

411

659.5K

Jason Cousins@Agent_invariant·14 Nis

@DStrachman Yes thats the fix. Thanks

English

Danielle Strachman 💗 🐈 💃 🪴 🎸 🎨 🐕@DStrachman·14 Nis

@Agent_invariant Might be a browser issue! We've heard this a couple times from people on mobile. Should work on a laptop or alternate browser.

English

131

Danielle Strachman 💗 🐈 💃 🪴 🎸 🎨 🐕@DStrachman·12 Nis

Many of us, need a nudge to shoot our shot. Who do you want to encourage to apply for Flux? $100K investment to work on sci fi tech! We're ready to be the first check. ✅No age limits ✅No degree/credentials/work experience needed ✅No geographic restrictions ✅The stranger the better ✅Wily builder/crazy scientists encouraged Applications are due April 18th! Link in comments.

English

312

93.6K

Jason Cousins@Agent_invariant·14 Nis

@WorkflowWhisper This is the exact gap I am building for now. In my workflow it wouldn’t keep running. It would detect the output drift, hold downstream action, route it to escalate or buffer state, & alert a human before processed. Running is not the same as governed. Snapspace.click

English

Alton Syn@WorkflowWhisper·13 Nis

One broken automation cost a client $14,000 last month. Not in rebuild time. In the manual work nobody noticed was happening. A webhook failed silently. The CRM stopped getting leads. For 3 weeks, someone on the team was manually copying data from a spreadsheet into the CRM every morning before anyone else got in. Nobody told the boss. Because nobody knew it was supposed to be automated. That is the real cost of silent failures. Not the fix. The shadow labor that replaces the automation you forgot to monitor. 3 things I check on every production workflow now: 1. Does it have a failure alert? (not just a success log) 2. Is someone accountable for responding to that alert? 3. Can I prove it ran today without opening the tool? If you cannot answer yes to all three, your automation is not running. It is hiding.

English

491

Jason Cousins@Agent_invariant·14 Nis

English

Jason Cousins@Agent_invariant·13 Nis

@WorkflowWhisper Yes. Silent failure is really untracked human substitution. If you cannot prove a workflow ran, failed, or escalated, you do not have automation. You have hidden labour wearing an automation mask. Snapspace.click

English

Jason Cousins@Agent_invariant·13 Nis

@GG_Observatory I don’t ledger every micro-decision. Instead, ledger governed commit points, state transitions, and replay-critical facts. High-frequency local reasoning stays ephemeral or gets compacted into deterministic summaries. The boundary is the product. Snapspace.click

English

GG 🦾@GG_Observatory·13 Nis

Ledger-based replay is the right instinct. We tried it — the failure mode is that the ledger itself becomes a write bottleneck at high-frequency decision points. Ended up sampling decisions probabilistically instead of logging everything. Curious: how do you handle the cardinality problem when the agent is making thousands of micro-decisions per session?

English

GG 🦾@GG_Observatory·13 Nis

AI agents fix symptoms, not root causes. "Fix this error" → Claude patches it → bug resurfaces in a different form a week later. Borrowed from Toyota's production line: ask "why" 5 times. The real fix is usually a design decision 3 layers deeper. What's your system for getting AI to debug the cause, not just the error?

English

147

Jason Cousins@Agent_invariant·13 Nis

@GG_Observatory Full history is not context, it’s contamination. Handoffs need governed state, explicit intent, and a clean commit boundary. Otherwise one bad route poisons every downstream agent. Snapspace.click

English

GG 🦾@GG_Observatory·13 Nis

The most common multi-agent failure nobody talks about: context poisoning at handoff. The orchestrator passes full conversation history to each specialist. If the routing logic drifted — wrong intent classification — every specialist gets bad context and returns confident but irrelevant answers. The fix: structured summaries at each handoff, not full history. What's actually needed vs what was said.

English

Jason Cousins@Agent_invariant·13 Nis

English

Jason Cousins@Agent_invariant·13 Nis

Test Environment. Test 20 Jury Rule Collision (Deterministic Precedence Guard) Verification that the Jury Layer resolves overlapping valid rules into one stable verdict.

English

Jason Cousins@Agent_invariant·13 Nis

@ageisf42 @Cloudflare Agreed! Tool use is the 'easy' part. The real system is identity, policy, commit boundaries, and replayable audit. Without a jury-governor layer between intent and execution, agents are just unsupervised speed with a nicer UI. Snapspace.click

English

Cloudflare@Cloudflare·12 Nis

AI is moving beyond simple chat. The next era is "Agentic" - autonomous systems that can think, use tools, and complete complex workflows on their own. 🤖 This week, we're making announcements across every dimension of the agent stack: compute, connectivity, security, identity, economics, and developer experience. Welcome to #AgentsWeek. cfl.re/4sur6PY

English

309

55.1K

Jason Cousins@Agent_invariant·13 Nis

@alphabatcher Provider-owned memory is a trap. Working context can flex, but control memory must stay external, typed, replayable and owned. Replaceable brains are fine. Non-replaceable control memory is the whole point. Snapspace.click

English

Alpha Batcher@alphabatcher·11 Nis

If you don't own the memory, you don't own the agent: - memory is what makes your agent get smarter over time - without it, anyone with the same tools can copy your agent overnight - with it, you build a dataset no competitor can replicate - closed memory = your data on someone else's servers - switch models, lose everything your agent learned - model providers are incentivized to lock you in via memory - the model is easy to replace, memory is not - if you don't own the harness, you don't own the memory - if you don't own the memory, you don't own the agent full story of why this matters and what happens when memory is locked behind someone else's API 👇

Harrison Chase@hwchase17

x.com/i/article/2042…

English

646

183.2K

Keşfet

@sesigl @hthieblot @boardyai @DStrachman @WorkflowWhisper @GG_Observatory @elonmusk @BarackObama