
🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety



















