Most agent demos assume one trust domain. Real B2B flows don't. The moment an AI agent touches another company's data, you need separate audit, policy, and approval paths. That's the class of problem I'm building AXME Mesh for.
My read: the next layer for coding agents is explicit continuity, not bigger context windows. Branches, commits, and diffs are useful, but teams also need resumable runs, durable memory, and replay rules that survive a fresh session.
The interesting part is where plain git stops helping. Git tracks files well. It does not track why a test was skipped, which API call already fired, or whether a plan was approved by a human. Agent state is repo state plus execution state.
The popularity of Git for AI Agents is a signal that coding agents still lack a stable unit of memory. Session context is too soft for multi-step repo work, so people keep rebuilding commits, branches, and logs as control primitives.
2 failure classes keep showing up in coding agents: control-flow bugs and trust-boundary bugs. HN is right on the first. The second starts when work crosses teams or vendors. That's the problem I built AXME Mesh for.
AGENTS.md linting makes sense because repo memory rots faster than prompts. Hard part is proving instructions still match the actual toolchain and paths. Second-order effect: context files need CI and ownership, not vibe-maintenance.
If you're building with OpenAI, Anthropic, LangGraph, or plain workers, the question isn't "which model" first. It's "what owns retries, handoffs, and waiting safely". Curious what people are using for that today.
Stop wiring agent workflows together with webhooks. Here's what to use instead: durable intents with receipts, retries, and a real waiting state. Agents don't fail on the happy path. They fail after a crash, timeout, or 2-day human delay.
I was wrong about prompt-first agents. After watching Claude Code and Codex skip checks or loop on file edits, I changed my mind. Hard part is state machines around side effects. Second-order effect: evals need traces, not just final diffs.
This is basically why I built AXME Cloud the way I did: as a communication layer for AI agents with durable agent workflows, human approvals for AI agents, and audit trails. Alpha still. Curious what receipts people treat as the minimum set.
Coding agents fail less when the model is inside control flow instead of pretending to be the control flow. HN is right to push on this. Prompts can suggest a plan. They cannot own retries, receipts, or stop conditions.