
Can 2025 LLMs balance high IQ (Strategy) with high EQ (Personality)? 🤖🧠❤️ Thanks to the amazing @JentseHuang, we have updated our three major benchmarks -- PsychoBench, EmotionBench, and GAMA-Bench -- with the latest 2025 models (Gemini-3, GPT-5, Claude-4.5, DeepSeek-v3.2). The results reveal a shifting landscape in AI cognition and sentiment. 1⃣ Strategic Reasoning has a new king. On GAMA-Bench (ICLR'25), Claude-Sonnet-4 dominates with a score of 83.7, far outpacing GPT-4o (65.5). This suggests that models like Claude are cracking complex multi-agent cooperation and betrayal dynamics better than their peers. 2⃣ High Empathy is becoming the norm. On PsychoBench (ICLR'24), Gemini-2.5-Pro reaches an Empathy score of 6.96, significantly surpassing the human crowd baseline of 4.92. Meanwhile, safety tuning is working: Gemini-3-Pro shows near-zero Dark Triad traits (Narcissism/Machiavellianism), whereas older open models like LLaMA-3.1-70B retain "darker" profiles. 3⃣ Emotional Resilience is stabilizing. On EmotionBench (NeurIPS'24), GPT-5 displays human-like stability. Its positive mood drops significantly less in negative contexts (-8.2) compared to GPT-4 (-27.6), while DeepSeek-V3.2 demonstrates sharper, more appropriate negative empathy responses. 4⃣ Different families, different strengths. While Claude leads in pure game-theoretic reasoning (GAMA), the Gemini and GPT-5 families excel in psychological safety and emotional stability. The gap between "Thinking" and "Feeling" capabilities is defining the 2025 model landscape. GAMA-Bench (ICLR'25): cuhk-arise.github.io/GAMABench/ 🧠 PsychoBench (ICLR'24): cuhk-arise.github.io/PsychoBench/ ❤️ EmotionBench (NeurIPS'24): cuhk-arise.github.io/EmotionBench/ 🎭 We will keep maintaining our benchmarks! We have updated our GitHub code so SOTA models can easily be tested.










