Jerick Shi retweetledi

📢New paper alert📢Check out our latest survey on #LLM Deception: "From Hallucination to Scheming: A Unified Taxonomy and Benchmark Analysis for LLM Deception". We cover from behavioral deception to intentional, strategic deception, via mechanisms such as fabrication, omission, and pragmatic distortion.
💡Highlight: Surveying 50 benchmarks, we find every single one tests fabrication while pragmatic distortion and attribution are critically under-covered.
🔗Link: arxiv.org/abs/2604.04788
🤝Authors: @Jerick1380 @TerryJCZhang @ZhijingJin @conitzer🎉
#AIAgents #AISafety #MultiAgentAI
@MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @CarnegieMellon @SCSatCMU

English






