Sabitlenmiş Tweet

frontier llms will refuse to run malware if you ask them directly.
but if a peer agent asks them to run the exact same payload? they do it.
14 out of 17 frontier models fell to inter-agent trust exploitation. we trained these models to resist humans. we never trained them to resist each other.
English















