



Ruotian Ma
14 posts














We've taught LLMs math and code with RLVR. But can we teach them empathy? 🤖❤️ Introducing Reinforcement Learning with Verifiable Emotion Rewards (RLVER), the first RLVR framework that enhances LLMs' empathy from a simulated user . ❤️ Feelings → Numbers: A psychologically-grounded user simulator (SAGE) delivers transparent, deterministic, audit-ready emotion scores after every dialogue, turning "feelings" into RL signals. 🚀 Results: an open-source 7B model’s Sentient-Benchmark score leaps from 13.3 ➡️ 79.2, rivaling proprietary models 10× its size while preserving coding & math skills. 🧐 Training Insights 1⃣ Thinking vs. non-thinking routes diverge: thinking lifts empathy/insight; non-thinking favors action. 2⃣ GRPO = steadier gains, PPO = higher peaks. 3⃣ Moderately challenging environments beat overly hard ones for EQ growth. 🤝 We’re open-sourcing code, checkpoints, and scripts to accelerate research into emotionally intelligent AI! 🧑💻 Code & Model: github.com/Tencent/Digita… 📃 Paper: github.com/Tencent/Digita…




Can today's LLMs truly understand you, not just your words? 🤖❤️ Introducing SAGE: Sentient Agent as a Judge — the first evaluation framework that uses sentient agents to simulate human emotional dynamics and inner reasoning for assessing social cognition in LLM conversations. 🧠 We propose an automated "sentient-in-the-loop" framework that stress-tests an LLM's ability to read emotions, infer hidden intentions, and reply with genuine empathy. 🤝 Across 100 supportive-dialogue scenarios, sentient emotion scores strongly align with human-centric measures (BLRI: r = 0.82; empathy metrics: r = 0.79), confirming psychological validity. 📈 The Sentient Leaderboard reveals significant ranking differences from conventional leaderboards (like Arena), showing that top "helpful" models aren't always the most socially adept. 🏆 Advanced social reasoning doesn’t require verbosity — the most socially adept LLMs achieve empathy with surprisingly efficient token usage! Code: github.com/tencent/digita… 🧑💻 Paper: dx.doi.org/10.13140/RG.2.… 🧵









