Ihtesham Ali@ihtesham2005
A Beijing AI lab just released a model that can run 300 agents at once without any of them interfering with each other, and the mechanism they used to pull it off is something most researchers thought was years away.
The lab is Moonshot AI. The founder is Yang Zhilin.
The model is called Kimi K2.6, and they shipped it on April 20, 2026, under a modified MIT license. 1 trillion parameters. 32 billion active per token. Fully open weights. You can download it today.
Here is the part that matters, and why almost nobody building multi-agent systems saw it coming.
Every serious attempt at running multiple AI agents on the same task before this one collapsed past a certain threshold. Researchers have a name for it. Coordination hallucinations. Agents giving each other contradictory instructions. Agents stepping on each other's work. Agents reaching conclusions that looked right in isolation but made no sense when you tried to stitch them together. The more agents you added, the worse the output got. Above around 100 agents, the whole system became unstable.
The standard response to this problem was to build an orchestration layer on top of a single model. An external framework that routes tasks to agents, tracks progress, and merges outputs. Every multi-agent startup in the last two years has shipped some version of this. None of them scale past a ceiling.
Moonshot did something different.
They trained the orchestrator as part of the model itself. The coordinator is not a wrapper. It is not a framework. It is a core capability baked into K2.6 during training. The model understands how to decompose a task into parallel subtasks. It knows how to route work to specialized sub-agents based on their skill profiles. It detects when an agent has stalled or failed, reassigns the task automatically, and merges the outputs into a single coherent result.
The numbers are unlike anything that came before. 300 sub-agents running in parallel. 4,000 coordinated steps in a single autonomous run. 12 hours of continuous execution on hard engineering problems. One RL infrastructure team at Moonshot ran a K2.6-powered agent autonomously for 5 straight days managing monitoring, incident response, and system operations without human supervision. It never lost the thread.
The benchmark nobody wants to talk about is BrowseComp in agent swarm mode. K2.6 scored 86.3. GPT-5.4 scored 78.4. That is not a rounding error. That is an 8-point lead on a benchmark specifically designed to test multi-agent coordination, and it was won by the only open-source model in the comparison.
The counterintuitive finding is this. More agents does not automatically mean better results. Every ad-hoc multi-agent system before K2.6 proved the opposite. The value is not in the number. The value is in whether the orchestration holds together long enough and cleanly enough to turn 300 independent workers into something that behaves like a single, focused mind.
The bottleneck in AI was never intelligence. It was stamina.
The first model that figured out how to stay coordinated for 13 hours just made every 1-hour model obsolete.