Eric MacDougall

1K posts

Eric MacDougall

@ericmacdougall

Co-Founder @ Good Ventures

Victoria, British Columbia Joined Temmuz 2009

2.6K Following22.6K Followers

Eric MacDougall@ericmacdougall·1h

AI coding tools write code that passes tests and still contains security holes. Developers using them write worse-secured code than developers without them, and feel more confident it's safe. If your team ships AI-generated code and treats "the tests pass" as "this is safe," your build pipeline is the vulnerability. Perry, Srivastava, Kumar, Boneh (Stanford, CCS 2023, arxiv 2211.03622): 47 participants, five security tasks, three languages. Developers with AI assistants wrote significantly less secure code AND believed they wrote more secure code. Both effects. "Broken by Default" (arxiv 2604.05292) formally verified 3,500 AI-generated code artifacts across seven LLMs. "Security Degradation in Iterative AI Code Generation" (arxiv 2506.11022) found 37.6% increase in critical vulnerabilities after 5 iteration cycles. "Taught by the Flawed" (arxiv 2511.09879) traces the root to training data insecurity. Functional tests pass. The code accepts malicious input. This is a structural mismatch between what execution proves and what attackers exploit. Ship security-specific verification pipelines as primary mechanism, not supplement.

English

Eric MacDougall@ericmacdougall·1h

BitNet b1.58 is one of the most important efficiency ideas in AI right now, and it's also still a hypothesis. Largest publicly released native ternary model: BitNet b1.58 2B4T (2B params, April 2025). 7B and 13B are on the roadmap. No independently validated native ternary frontier-scale result exists. MoE plus ternary weights hint at compounding gains. Tooling is moving fast. Production deployment lags both. Ecosystem momentum is not proof at the scale that determines product economics.

English

Eric MacDougall@ericmacdougall·2h

Posting a few agentic commerce, agent harness and similar open source project rq. Who wants to collab pre release ?

English

Eric MacDougall@ericmacdougall·2h

@JakeKing Ya man. Good to collab on this stuff. Posting a few open source repos soon 🇨🇦

English

Jake@JakeKing·1d

Who's building devtools in 🇨🇦right now? want my algo to be filled with cool people building cool shit up here.

English

108

225

14.1K

Eric MacDougall@ericmacdougall·5h

HippoRAG (Gutiérrez et al., NeurIPS 2024) maps hippocampal indexing theory onto RAG. LLM = neocortex, knowledge graph = hippocampal index, Personalized PageRank = associative retrieval. 20% gain on multi-hop QA at 10-30x lower cost than iterative retrieval. Frozen weights, growing index, no catastrophic forgetting.

English

Eric MacDougall@ericmacdougall·22h

The most important AI feature is revocation. Not generation. Not reasoning. Not tool use. Revocation. The ability to stop an agent immediately when something goes wrong. Kill switches are not philosophically interesting. They're operationally essential.

English

Eric MacDougall@ericmacdougall·22h

MCP: 20K tools, 97M monthly downloads, 16 months old. First supply chain attack: already happened. First critical CVEs: documented. New "semantic" vulnerabilities: emerging. npm took years for its first major attack. MCP is moving faster. The security field hasn't caught up.

English

Eric MacDougall@ericmacdougall·23h

Rule of thumb: single-agent baseline above 45% accuracy? Don't add agents. Add skills. Add tools. Add better context. Multi-agent wins only when specialization is truly divisible and coordination costs amortize across massive scale. Treat agent count as a cost to justify.

English

Eric MacDougall@ericmacdougall·23h

Iterathon's receipt: built multi-agent customer support, burned $47K, benchmarked single-agent and found 92.2% vs 94.3% accuracy. 4.3x token amplification, 6.8s latency vs 2.3s target. $24.7K/month coordination overhead. Refactored back. Saved $296K/year.

English

Eric MacDougall@ericmacdougall·23h

Cemri et al. 2025 identified 14 distinct multi-agent failure modes (Cohen's Kappa 0.88). The killer one: information loss during inter-agent summarization is unrecoverable. Downstream agents lose context upstream agents had.

English

Eric MacDougall@ericmacdougall·23h

arxiv 2505.18286: across 7 datasets, multi-agent consumes 4-220x more input tokens than single-agent. Even with perfect context reuse, 2-12x more generation tokens. Anthropic's own admission: "much of the apparent advantage of MAS comes from increased compute."

English

Eric MacDougall@ericmacdougall·23h

Arxiv 2604.02460 proved it with information theory: under fixed reasoning-token budget, single-agent consistently matches or outperforms multi-agent on multi-hop reasoning. Confirmed across Qwen3, DeepSeek-R1, Gemini 2.5.

English

Eric MacDougall@ericmacdougall·23h

The Rule of 4: effective team sizes cap at 3-4 agents. Beyond that, coordination overhead grows super-linearly with exponent 1.724. You pay exponentially more compute for linearly less lift.

English

Eric MacDougall@ericmacdougall·23h

Kim et al. 2026 (arxiv 2512.08296): 180 configurations, 5 architectures, 3 LLM families, 4 benchmarks. Tool-heavy tasks (10+ tools): 2-6x efficiency penalty with multi-agent vs single-agent. Above 45% single-agent accuracy, adding agents produces diminishing or negative returns.

English

Eric MacDougall@ericmacdougall·23h

Adding more agents doesn't reduce probabilistic failures. It compounds them. The research is clear now and it's expensive to learn this in production.

English

Eric MacDougall@ericmacdougall·1d

EU AI Act conformity assessment has a structural gap for multi-agent systems. Individual agent assessment can't predict system-level emergent behavior. Hammond et al. 2025 (Cooperative AI Foundation, 44+ authors across Oxford/DeepMind/Anthropic/CMU) taxonomize three multi-agent failure modes: miscoordination, conflict, collusion. Seven risk factors including emergent agency. None are visible when you audit agents individually. The Digital Omnibus now proposes extending high-risk deadlines to Dec 2027 because the infrastructure (standards, notified bodies) isn't ready. The framework for evaluating systems, not just components, isn't written yet. For anyone deploying multi-agent workflows in regulated sectors: you're going to be responsible for bridging the assessment gap yourself.

English

Eric MacDougall@ericmacdougall·1d

LLMs don't manipulate discrete symbols. They manipulate vectors. So Harnad's 1990 symbol grounding problem isn't the one that applies... the right frame is the vector grounding problem (Mollo and Millière 2023). Implication: multimodality and embodiment are neither necessary nor sufficient for meaning. The causal connection is what matters.

English

Eric MacDougall@ericmacdougall·1d

Exactly right, and the deeper constraint worth naming: Replay only works because the workflow itself is deterministic. Temporal re-executes the code on recovery and short-circuits each step by matching Commands against the Event History. LLM calls are non-deterministic by nature, so they can't live inside the deterministic workflow. They have to be Activities or Side Effects that execute outside the replay loop and have their results recorded. The workflow calls the LLM. The LLM is never the workflow. That's the architectural punchline for agent execution: probabilistic reasoning at typed interfaces, deterministic orchestration around it. Prompting and workflow aren't competing layers. They're doing different jobs.

English

Bnaf.OG | 🟧@bnafOg·2d

@ericmacdougall The mechanism: workflow orchestrators serialize intermediate state deterministically, so replay picks up at the exact failed step — not from scratch. LLMs can't self-recover because they have no persistent memory of prior steps. The checkpoint is what prompting can't replace.

English

Eric MacDougall@ericmacdougall·3d

Production AI isn't prompt-centric. It's workflow-centric. Temporal.io is now the backbone for AI agents at OpenAI (Codex web agent) and Replit (Agent 3). Reason: LLM API timeouts, mid-step failures, browser closes, resume-tomorrow workflows. None of those are solved by a better prompt. A boring prompt inside a durable workflow that can replay from a checkpoint beats a clever prompt that loses state on step 12 of 20. Treat agent execution as a workflow with checkpoints, not a function call with a return value.

English

109

Eric MacDougall@ericmacdougall·1d

A commerce protocol operates within its own escrow and dispute mechanism. Boson locks funds in its smart contracts, orchestrates via $BOSON, resolves via its Dispute Resolver. Strong design for that model. A cross-protocol commerce layer federates across commerce protocols AND payment rails. Agent uses ACP to checkout on a merchant, pays via card network, dispute via that rail's mechanism. Same agent uses Boson dACP for physical goods with staked commitment. Same agent uses x402 for atomic API purchases. One agent, three protocols, unified identity and reputation and audit trail across all.

English

Eric MacDougall@ericmacdougall·1d

Good framing on payment fragmentation becoming commerce fragmentation. That's the right diagnosis. Worth distinguishing though: Boson dACP is an excellent commerce protocol for its slice (physical goods, phygitals, RWAs with dispute windows, framework integrations via MCP). But positioning it as the commerce layer above fragmenting payment protocols is conflating two different abstractions.

English

Eric MacDougall@ericmacdougall·2d

ACP (Stripe/OpenAI): checkout flow. Fiat-only. ChatGPT Instant Checkout shipped then shuttered March 2026.AP2 (Google, 60+ partners): authorization via W3C VCs. Doesn't move money. "No consumer can use AP2 yet" per Chainstack.

English

185

Discover

@JakeKing @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine