Daniel Hensley@dw_hensley
Highly distributed, multi-codebase systems have structural properties that current AI coding tools can't navigate. What we've found is that the multi-codebase problem is a context problem, and context problems are solvable if you treat them as compilation rather than search.
𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺
We've been working with engineering teams that have many codebases totaling from 3 million to 40 million+ lines of code. A number of them run microservices architectures with hundreds of repos. A pattern we keep seeing: AI coding agents fall down significantly when a task crosses a service boundary. And when you're developing against a large microservices architecture, this is constant.
Three challenges drive this.
𝗖𝗼-𝗹𝗼𝗰𝗮𝗹𝗶𝘁𝘆. In a microservices architecture, no developer has every relevant repo cloned and open, and often doesn't start a task knowing which repos are relevant. An agent inherits this limitation. It can only reason about what it can see, and in a distributed architecture, most of the relevant context is somewhere else.
𝗜𝗺𝗽𝗹𝗶𝗰𝗶𝘁 𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝗶𝗲𝘀. In a monolith, dependencies are explicit via imports, function calls, class hierarchies, etc. and an agent can trace them. In a microservices architecture, the connection is often a URL string in a config file, or two services reading from the same database table without either one knowing about the other. These relationships are real and load-bearing, but invisible to tools that reason at the code level.
𝗡𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗰𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆. A codebase with N symbols has N-squared potential relationships. Teams try to solve this by introducing context about the service layer into their repos, but it doesn't scale: documentation rapidly goes stale and RAG-based approaches chunk code into text fragments, destroying the structural relationships that matter most in distributed systems. Furthermore, identifying the structural connectedness through interfaces of isolated services requires its own derivative analysis.
𝗔 𝗖𝗼𝗱𝗲𝗯𝗮𝘀𝗲 𝗖𝗼𝗺𝗽𝗶𝗹𝗲𝗿 𝗶𝘀 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹
Our approach at Driver starts with what Adam and I call our transpiler: a compiler-like architecture we built that begins by parsing code via static analysis and emits structured context instead of executable code. A core feature here is exhaustiveness. Symbol tables, dependency graphs, and syntax trees are all computed deterministically for every file and symbol in each codebase. This is the foundation, and it matters. It means every codebase we process is understood exhaustively, not sampled.
For microservice architectures, and other intensively multi-codebase systems, we have observed that this exhaustive understanding of each individual codebase is absolutely necessary but not sufficient.
The reason is that the connections between services don't live at the level of syntax analysis or even analysis of any kind of any one codebase. In a monolith, you can follow imports and function calls to trace a dependency. In a microservices architecture, the dependency might be an HTTP endpoint defined in one service and called via a string constant in another (a more difficult version of the "stringly-typed" problem). It might be a shared database table, a message queue topic, or an event schema that two services conform to independently. These relationships are invisible to any tool reasoning within a single repo, no matter how thoroughly it parses the code.
To capture these, we needed to go further: later stages of the transpiler that synthesize across the parsed structure of individual codebases, and a runtime layer that can reason across all of them simultaneously.
𝗔 𝗨𝗻𝗶𝗳𝗶𝗲𝗱 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝘁𝗵𝗮𝘁 𝗨𝗻𝗹𝗼𝗰𝗸𝘀 𝗠𝘂𝗹𝘁𝗶-𝗖𝗼𝗱𝗲𝗯𝗮𝘀𝗲
Two developments have unlocked multi-codebase context.
𝗛𝗶𝗴𝗵𝗲𝗿-𝗹𝗲𝘃𝗲𝗹 𝘀𝘆𝗻𝘁𝗵𝗲𝘀𝗶𝘀. Beyond symbol-level static analysis, later stages of our transpiler produce intermediate representations (IRs) at many different levels of abstraction. These culminate with what we call Deep Context Documents: atomic, codebase-wide documents that describe architecture, major components, how they interact, and (crucially) where to go for more detail. These are synthesized by exhaustive review of all the structured content from earlier transpiler stages. They capture the conceptual and architectural relationships that exist above the code level, including cross-cutting concerns, integration patterns, and service boundaries. For an agent, this solves the "don't know what you don't know" circular problem that is so challenging without exhaustive context.
𝗠𝘂𝗹𝘁𝗶-𝗰𝗼𝗱𝗲𝗯𝗮𝘀𝗲 𝗿𝘂𝗻𝘁𝗶𝗺𝗲 𝗽𝗿𝗶𝗺𝗶𝘁𝗶𝘃𝗲𝘀. We built a runtime layer where an agent connects to Driver via a single MCP integration and can query context across every codebase in an organization. Via the primitive tools, all pre-computed artifacts are queryable per-codebase: architecture overviews, file-level documentation, code maps, changelogs, source files. We also provide a single deep context sub-agent tool. Under the hood, this runs a dedicated context agent with access to all pre-computed artifacts. When a task spans multiple services, the agent queries context from different codebases exactly the way it queries within one. It synthesizes everything into a single, high-signal response tailored to the task at hand and returned to the caller.
The combination is what matters. The transpiler's static analysis gives you exhaustive per-codebase understanding. The higher-level synthesis captures cross-cutting relationships that don't live at the code level. The runtime makes all of it queryable across every codebase simultaneously, so the agent sees the full system, not just whatever repo it happens to be sitting in.
No co-locality requirement, manual assembly, or re-deriving what was already known.
𝗪𝗵𝗮𝘁 𝗪𝗲'𝘃𝗲 𝗦𝗲𝗲𝗻
A customer had a bug where bookings were completing despite no payment records. The investigation spanned 4-5 services across order orchestration, admin, and payments. This bug had been investigated multiple times before and missed. One engineer used Driver to trace the issue iteratively: broad exploration first, then narrowing hypotheses, then precise constraints. Driver pinpointed the exact method where validation logic silently passed orders with zero payment records. Total hands-on time: about 30 minutes. As the engineer described it, instead of needing three people with intensive knowledge of different services, one person got 80% of the way there, then brought in specialists to confirm.
Another team with 200+ microservices built a workflow on top of Driver that takes a single Jira ticket, identifies which repos are impacted, gathers context across all of them in parallel, and generates per-service subtasks with specific file paths, class names, and implementation patterns. The constraint before, as one of their engineers put it: "you've got to know which repos." With all repos accessible through a single integration, that constraint disappeared.
Beyond debugging and ticket analysis, teams report that Driver changes the baseline for development in multi-codebase environments. Concerns that historically slowed work (unknown blast radius, missing context from adjacent services) are addressed by the same cross-codebase context layer. One large microservices team reported 2x average PR velocity team-wide after introducing Driver.
𝗧𝗵𝗲 𝗕𝗿𝗼𝗮𝗱𝗲𝗿 𝗣𝗼𝗶𝗻𝘁
A fundamental tradeoff of microservices has always been autonomy in exchange for complexity. Small, independent services are easy to deploy individually but hard to reason about collectively. AI coding tools struggle with this in the same ways developers do. This is a genuinely complex problem and the fact that tools today (for agents and humans) are often single- or few-codebase centric exacerbates this issue. This tension has significantly held us all back.
What we've found is that this is a solvable problem if you approach it at the right level. Per-codebase exhaustive context compilation is the foundation, but the microservices unlock is at a higher layer: synthesized understanding that captures cross-service relationships, and a runtime that makes it all accessible through one integration. The agent sees the whole system and understands the connections relevant to a particular task.
The more codebases you have, the more immediately this compounds. If your team is running a distributed architecture and your agents are struggling with cross-service work, the issue is probably not the agent. It's what the agent can see.