Eric MacDougall

1K posts

Eric MacDougall banner
Eric MacDougall

Eric MacDougall

@ericmacdougall

Co-Founder @ Good Ventures

Victoria, British Columbia Joined Temmuz 2009
2.6K Following22.6K Followers
Eric MacDougall
Eric MacDougall@ericmacdougall·
AI coding tools write code that passes tests and still contains security holes. Developers using them write worse-secured code than developers without them, and feel more confident it's safe. If your team ships AI-generated code and treats "the tests pass" as "this is safe," your build pipeline is the vulnerability. Perry, Srivastava, Kumar, Boneh (Stanford, CCS 2023, arxiv 2211.03622): 47 participants, five security tasks, three languages. Developers with AI assistants wrote significantly less secure code AND believed they wrote more secure code. Both effects. "Broken by Default" (arxiv 2604.05292) formally verified 3,500 AI-generated code artifacts across seven LLMs. "Security Degradation in Iterative AI Code Generation" (arxiv 2506.11022) found 37.6% increase in critical vulnerabilities after 5 iteration cycles. "Taught by the Flawed" (arxiv 2511.09879) traces the root to training data insecurity. Functional tests pass. The code accepts malicious input. This is a structural mismatch between what execution proves and what attackers exploit. Ship security-specific verification pipelines as primary mechanism, not supplement.
English
0
0
0
36
Eric MacDougall
Eric MacDougall@ericmacdougall·
BitNet b1.58 is one of the most important efficiency ideas in AI right now, and it's also still a hypothesis. Largest publicly released native ternary model: BitNet b1.58 2B4T (2B params, April 2025). 7B and 13B are on the roadmap. No independently validated native ternary frontier-scale result exists. MoE plus ternary weights hint at compounding gains. Tooling is moving fast. Production deployment lags both. Ecosystem momentum is not proof at the scale that determines product economics.
English
0
0
0
25
Eric MacDougall
Eric MacDougall@ericmacdougall·
Posting a few agentic commerce, agent harness and similar open source project rq. Who wants to collab pre release ?
English
0
0
0
38
Eric MacDougall
Eric MacDougall@ericmacdougall·
@JakeKing Ya man. Good to collab on this stuff. Posting a few open source repos soon 🇨🇦
English
0
0
1
55
Jake
Jake@JakeKing·
Who's building devtools in 🇨🇦right now? want my algo to be filled with cool people building cool shit up here.
English
108
9
225
14.1K
Eric MacDougall
Eric MacDougall@ericmacdougall·
HippoRAG (Gutiérrez et al., NeurIPS 2024) maps hippocampal indexing theory onto RAG. LLM = neocortex, knowledge graph = hippocampal index, Personalized PageRank = associative retrieval. 20% gain on multi-hop QA at 10-30x lower cost than iterative retrieval. Frozen weights, growing index, no catastrophic forgetting.
English
0
0
0
41
Eric MacDougall
Eric MacDougall@ericmacdougall·
The most important AI feature is revocation. Not generation. Not reasoning. Not tool use. Revocation. The ability to stop an agent immediately when something goes wrong. Kill switches are not philosophically interesting. They're operationally essential.
English
0
0
0
54
Eric MacDougall
Eric MacDougall@ericmacdougall·
MCP: 20K tools, 97M monthly downloads, 16 months old. First supply chain attack: already happened. First critical CVEs: documented. New "semantic" vulnerabilities: emerging. npm took years for its first major attack. MCP is moving faster. The security field hasn't caught up.
English
0
0
0
77
Eric MacDougall
Eric MacDougall@ericmacdougall·
Rule of thumb: single-agent baseline above 45% accuracy? Don't add agents. Add skills. Add tools. Add better context. Multi-agent wins only when specialization is truly divisible and coordination costs amortize across massive scale. Treat agent count as a cost to justify.
English
0
0
0
54
Eric MacDougall
Eric MacDougall@ericmacdougall·
Iterathon's receipt: built multi-agent customer support, burned $47K, benchmarked single-agent and found 92.2% vs 94.3% accuracy. 4.3x token amplification, 6.8s latency vs 2.3s target. $24.7K/month coordination overhead. Refactored back. Saved $296K/year.
English
0
0
0
61
Eric MacDougall
Eric MacDougall@ericmacdougall·
Cemri et al. 2025 identified 14 distinct multi-agent failure modes (Cohen's Kappa 0.88). The killer one: information loss during inter-agent summarization is unrecoverable. Downstream agents lose context upstream agents had.
English
0
0
0
59
Eric MacDougall
Eric MacDougall@ericmacdougall·
arxiv 2505.18286: across 7 datasets, multi-agent consumes 4-220x more input tokens than single-agent. Even with perfect context reuse, 2-12x more generation tokens. Anthropic's own admission: "much of the apparent advantage of MAS comes from increased compute."
English
0
0
0
45
Eric MacDougall
Eric MacDougall@ericmacdougall·
Arxiv 2604.02460 proved it with information theory: under fixed reasoning-token budget, single-agent consistently matches or outperforms multi-agent on multi-hop reasoning. Confirmed across Qwen3, DeepSeek-R1, Gemini 2.5.
English
0
0
0
74
Eric MacDougall
Eric MacDougall@ericmacdougall·
The Rule of 4: effective team sizes cap at 3-4 agents. Beyond that, coordination overhead grows super-linearly with exponent 1.724. You pay exponentially more compute for linearly less lift.
English
0
0
0
37
Eric MacDougall
Eric MacDougall@ericmacdougall·
Kim et al. 2026 (arxiv 2512.08296): 180 configurations, 5 architectures, 3 LLM families, 4 benchmarks. Tool-heavy tasks (10+ tools): 2-6x efficiency penalty with multi-agent vs single-agent. Above 45% single-agent accuracy, adding agents produces diminishing or negative returns.
English
0
0
0
36
Eric MacDougall
Eric MacDougall@ericmacdougall·
Adding more agents doesn't reduce probabilistic failures. It compounds them. The research is clear now and it's expensive to learn this in production.
English
0
0
0
42
Eric MacDougall
Eric MacDougall@ericmacdougall·
EU AI Act conformity assessment has a structural gap for multi-agent systems. Individual agent assessment can't predict system-level emergent behavior. Hammond et al. 2025 (Cooperative AI Foundation, 44+ authors across Oxford/DeepMind/Anthropic/CMU) taxonomize three multi-agent failure modes: miscoordination, conflict, collusion. Seven risk factors including emergent agency. None are visible when you audit agents individually. The Digital Omnibus now proposes extending high-risk deadlines to Dec 2027 because the infrastructure (standards, notified bodies) isn't ready. The framework for evaluating systems, not just components, isn't written yet. For anyone deploying multi-agent workflows in regulated sectors: you're going to be responsible for bridging the assessment gap yourself.
English
0
0
1
95
Eric MacDougall
Eric MacDougall@ericmacdougall·
LLMs don't manipulate discrete symbols. They manipulate vectors. So Harnad's 1990 symbol grounding problem isn't the one that applies... the right frame is the vector grounding problem (Mollo and Millière 2023). Implication: multimodality and embodiment are neither necessary nor sufficient for meaning. The causal connection is what matters.
English
0
0
0
79
Eric MacDougall
Eric MacDougall@ericmacdougall·
Exactly right, and the deeper constraint worth naming: Replay only works because the workflow itself is deterministic. Temporal re-executes the code on recovery and short-circuits each step by matching Commands against the Event History. LLM calls are non-deterministic by nature, so they can't live inside the deterministic workflow. They have to be Activities or Side Effects that execute outside the replay loop and have their results recorded. The workflow calls the LLM. The LLM is never the workflow. That's the architectural punchline for agent execution: probabilistic reasoning at typed interfaces, deterministic orchestration around it. Prompting and workflow aren't competing layers. They're doing different jobs.
English
0
0
0
6
Bnaf.OG | 🟧
Bnaf.OG | 🟧@bnafOg·
@ericmacdougall The mechanism: workflow orchestrators serialize intermediate state deterministically, so replay picks up at the exact failed step — not from scratch. LLMs can't self-recover because they have no persistent memory of prior steps. The checkpoint is what prompting can't replace.
English
1
0
0
8
Eric MacDougall
Eric MacDougall@ericmacdougall·
Production AI isn't prompt-centric. It's workflow-centric. Temporal.io is now the backbone for AI agents at OpenAI (Codex web agent) and Replit (Agent 3). Reason: LLM API timeouts, mid-step failures, browser closes, resume-tomorrow workflows. None of those are solved by a better prompt. A boring prompt inside a durable workflow that can replay from a checkpoint beats a clever prompt that loses state on step 12 of 20. Treat agent execution as a workflow with checkpoints, not a function call with a return value.
English
1
0
0
109
Eric MacDougall
Eric MacDougall@ericmacdougall·
A commerce protocol operates within its own escrow and dispute mechanism. Boson locks funds in its smart contracts, orchestrates via $BOSON, resolves via its Dispute Resolver. Strong design for that model. A cross-protocol commerce layer federates across commerce protocols AND payment rails. Agent uses ACP to checkout on a merchant, pays via card network, dispute via that rail's mechanism. Same agent uses Boson dACP for physical goods with staked commitment. Same agent uses x402 for atomic API purchases. One agent, three protocols, unified identity and reputation and audit trail across all.
English
0
0
0
27
Eric MacDougall
Eric MacDougall@ericmacdougall·
Good framing on payment fragmentation becoming commerce fragmentation. That's the right diagnosis. Worth distinguishing though: Boson dACP is an excellent commerce protocol for its slice (physical goods, phygitals, RWAs with dispute windows, framework integrations via MCP). But positioning it as the commerce layer above fragmenting payment protocols is conflating two different abstractions.
English
1
0
0
9
Eric MacDougall
Eric MacDougall@ericmacdougall·
ACP (Stripe/OpenAI): checkout flow. Fiat-only. ChatGPT Instant Checkout shipped then shuttered March 2026.AP2 (Google, 60+ partners): authorization via W3C VCs. Doesn't move money. "No consumer can use AP2 yet" per Chainstack.
English
1
0
0
185