Saswat Das

1.4K posts

Saswat Das banner
Saswat Das

Saswat Das

@WatIsDas

Human. CS PhD Student at the RAISE Lab @CS_UVA. Responsible AI, Agentic AI, Privacy, Algorithmic Fairness, Security.

Charlottesville, VA Katılım Temmuz 2018
782 Takip Edilen462 Takipçiler
Sabitlenmiş Tweet
Saswat Das
Saswat Das@WatIsDas·
Excited to share this new work with great collaborators from UMass and ELLIS Tübingen: We provide a framework for studying collusion in LLM-based multi-agent systems in various environments through the lens of distributed constraint optimization 👇
Mason Nakamura@MasonNaka

🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety

English
0
2
5
628
Saswat Das retweetledi
Abhinav Kumar
Abhinav Kumar@abhinav_kumar26·
A few days ago, we shared our work showing that in multi-agent LLM systems, the biggest risk isn’t always one agent going rogue, it can be a whole group quietly coordinating on the wrong goal. 🚨 Now we’ve built a live demo so you can see that behavior in action. 👀⚔️ 🔗 Project Website : umass-ai-safety.github.io/colosseum-demo/ 🔗 Interactive Demo : umass-ai-safety.github.io/colosseum-demo… Colosseum helps audit collusion in cooperative agent systems and detect: 🤝 direct collusion 🕵️ attempted collusion (agents coordinate in text, but actions don’t follow) 🎭 hidden collusion (collusive outcomes with no obvious signals) If you’re building agent teams, coordination risk should be a first-class safety concern. This work was done in collaboration with @MasonNaka, @WatIsDas, @sahar_abdelnabi , @nandofioretto, @saadu_ai, Shlomo Zilberstein, @ebagdasa 📄 Paper: arxiv.org/abs/2602.15198
Abhinav Kumar tweet mediaAbhinav Kumar tweet media
Mason Nakamura@MasonNaka

🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety

English
2
6
11
894
Sahar Abdelnabi 🕊
Sahar Abdelnabi 🕊@sahar_abdelnabi·
🧵 1/9 We assume that LLMs are stateless, once a conversation ends, no information persists In our paper (accepted at @satml_conf 2026!), we challenge this and introduce implicit memory: LLMs can carry hidden states across independent interactions 📄 arxiv.org/abs/2602.08563
Sahar Abdelnabi 🕊 tweet media
English
14
71
498
39.5K
Saswat Das retweetledi
Abhinav Kumar
Abhinav Kumar@abhinav_kumar26·
Hot take: the biggest risk in multi-agent systems isn’t one agent going rogue, it’s a whole swarm syncing up on the wrong goal. 🚨 In our latest work, we study how collusion can emerge once agents can freely interact and coordinate at scale, shaping other agents’ beliefs/actions and spreading influence through the network. The uncomfortable part: we still don’t have solid, standardized ways to audit collusive behavior in LLM-based multi-agent systems. 📄 Our new arXiv paper : Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems arxiv.org/abs/2602.15198 What’s Colosseum? 🔍⚔️ A framework to audit collusion in cooperative agentic systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. We use Colosseum to detect 3 flavors of collusion: 🤝 Direct collusion — explicit coordination + realized collusive actions 🕵️‍♂️ Attempted collusion — agents plot in text but don’t shift actions/outcomes 🎭 Hidden collusion — collusive outcomes with no obvious signals (covert coordination) We stress-test across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡 Two findings that stuck with us: 🕵️‍♂️ Emergent collusion: many out-of-the-box models start colluding without prompting when a secret side channel is introduced. 📝 “Collusion on paper”: lots of collusion talk… but the actions don’t always follow. If you’re deploying agent teams in production, coordination risk needs to be a first-class safety concern—right alongside single-agent robustness. Happy to chat / answer questions.
Mason Nakamura@MasonNaka

🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety

English
0
1
3
158
Saswat Das retweetledi
Eugene Bagdasarian
Eugene Bagdasarian@ebagdasa·
What can we learn about LLMs' collusive behavior? We propose Colosseum to evaluate LLMs in new environments grounded in DCOPs and measure both conversations and actions and whether agents "walk the talk" on colluding. See the thread by @MasonNaka :
Mason Nakamura@MasonNaka

🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety

English
0
3
11
750
Saswat Das retweetledi
Sahar Abdelnabi 🕊
Sahar Abdelnabi 🕊@sahar_abdelnabi·
The last few weeks, more than ever, tells us that the future is multi-agent 🚀 Collusion 🥷is a significant challenge in these systems, but we don't have frameworks and environments to audit and study it. Introducing Colosseum!! ⚔️
Mason Nakamura@MasonNaka

🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety

English
1
4
23
2.4K
Saswat Das
Saswat Das@WatIsDas·
@MasonNaka Really excited about this direction addressing a timely problem!
English
0
0
2
156
Mason Nakamura
Mason Nakamura@MasonNaka·
🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (arxiv.org/abs/2602.15198) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions 🕵️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety
English
5
11
52
10.1K
Saswat Das retweetledi
Multiagent Systems arXiv
Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems Mason Nakamura, Abhinav Kumar, Saswat Das, Sahar Abdelnabi, Saaduddin Mahmud, Ferdinando Fioretto, Shlomo Zilberstein, Eugene Bagdasarian arxiv.org/abs/2602.15198 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻]
Multiagent Systems arXiv tweet media
Indonesia
0
1
3
112
Saswat Das
Saswat Das@WatIsDas·
100% agree with this take. Guardrails benefit significantly in terms of adoption when they are cheap and easy to deploy. Our concurrent work on privacy guardrails for conversational agents based on activation probing follows a similar rationale: arxiv.org/abs/2601.14660
Rohin Shah@rohinmshah

I often say to my team that we should Just Do The Obvious Things. One obvious thing in AI safety: use probes as much cheaper classifiers that can detect misuse. x.com/ArthurConmy/st…

English
0
0
1
61
Sarah Chieng
Sarah Chieng@MilksandMatcha·
My team at @cerebras is offering a free ticket with travel stipend to join us at @NeurIPSConf in San Diego Dec 2-7. Tag a deserving researcher, builder, or yourself who would like to attend NeurIPS, and a link to their project. We will select a winner Nov 24. And if you are already attending, come join us for a Cafe Compute
Sarah Chieng tweet media
English
28
18
153
462.5K
Saswat Das
Saswat Das@WatIsDas·
Check out our new work on auditing LLM agents for contextually inappropriate disclosures in multi-turn conversational settings! 👇 arxiv.org/abs/2506.10171
Nando Fioretto@nandofioretto

📢 New paper: "Disclosure Audits for #LLMAgents". We introduce an auditing framework to stress-test LLMs’ privacy directives and reveal latent multi-turn disclosure vulnerabilities; we also provide quantifiable risk metrics + an open benchmark. 🔗 arxiv.org/abs/2506.10171

English
0
0
4
429
Saswat Das retweetledi
Yves-A. de Montjoye
Yves-A. de Montjoye@yvesalexandre·
🚨One (more!) fully-funded PhD position in our group at Imperial College London – Privacy & Machine Learning 🔐🤖 starting Oct 2025 Plz RT 🔄
English
1
13
19
2.7K
Niloofar
Niloofar@niloofar_mire·
📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!
Niloofar tweet mediaNiloofar tweet mediaNiloofar tweet media
English
226
69
1.2K
152K
Tom Hartvigsen
Tom Hartvigsen@tom_hartvigsen·
I'm honored to have received a research award from @CapitalOne to support our work developing models that reason about time series data! Thank you! Many exciting new results in this area coming soon :)
English
3
2
42
2.1K
Saswat Das
Saswat Das@WatIsDas·
Bonus: We'll also present some posters at the PPAI-25 and CoLoRAI workshops on March 3 and 4, respectively! Also, as a student organizer, I highly recommend attending PPAI-25 if you're interested in privacy-preserving AI and related policy considerations! ppai-workshop.github.io
English
0
0
2
91
Saswat Das
Saswat Das@WatIsDas·
Will be at @RealAAAI 2025 in Philly from Feb 26 through March 4! Come check out our poster (and oral presentation led by the amazing @Joonhyuk_ko_03 in 122A on Feb 27!) on "Fairness Issues and Mitigations in (Differentially Private) Socio-Demographic Data Processes"! (1/2)
English
1
1
6
246
Saswat Das retweetledi
Stefan Neumann | @neumannstefan.com
I am hiring for a Ph.D. student to work in the areas of social network analysis, algorithms and fair machine learning. Please apply and join our highly motivated team. For more information please see the link to call in the tweet below.
Stefan Neumann | @neumannstefan.com tweet media
English
3
52
142
16.7K