Xing Han 韩星

12 posts

Xing Han 韩星 banner
Xing Han 韩星

Xing Han 韩星

@xinghan0

Postdoctoral Fellow @JohnsHopkins | Former Research Intern @Google @salesforce @intuit @CognitiveScale | Alumnus @UTAustin @EdinburghUni

Maryland, USA Katılım Eylül 2015
185 Takip Edilen65 Takipçiler
Xing Han 韩星 retweetledi
J Huang
J Huang@JentseHuang·
We will keep maintaining our benchmarks! We have updated our GitHub code so SOTA models can easily be tested.
Zhaopeng Tu@tuzhaopeng

Can 2025 LLMs balance high IQ (Strategy) with high EQ (Personality)? 🤖🧠❤️ Thanks to the amazing @JentseHuang, we have updated our three major benchmarks -- PsychoBench, EmotionBench, and GAMA-Bench -- with the latest 2025 models (Gemini-3, GPT-5, Claude-4.5, DeepSeek-v3.2). The results reveal a shifting landscape in AI cognition and sentiment. 1⃣ Strategic Reasoning has a new king. On GAMA-Bench (ICLR'25), Claude-Sonnet-4 dominates with a score of 83.7, far outpacing GPT-4o (65.5). This suggests that models like Claude are cracking complex multi-agent cooperation and betrayal dynamics better than their peers. 2⃣ High Empathy is becoming the norm. On PsychoBench (ICLR'24), Gemini-2.5-Pro reaches an Empathy score of 6.96, significantly surpassing the human crowd baseline of 4.92. Meanwhile, safety tuning is working: Gemini-3-Pro shows near-zero Dark Triad traits (Narcissism/Machiavellianism), whereas older open models like LLaMA-3.1-70B retain "darker" profiles. 3⃣ Emotional Resilience is stabilizing. On EmotionBench (NeurIPS'24), GPT-5 displays human-like stability. Its positive mood drops significantly less in negative contexts (-8.2) compared to GPT-4 (-27.6), while DeepSeek-V3.2 demonstrates sharper, more appropriate negative empathy responses. 4⃣ Different families, different strengths. While Claude leads in pure game-theoretic reasoning (GAMA), the Gemini and GPT-5 families excel in psychological safety and emotional stability. The gap between "Thinking" and "Feeling" capabilities is defining the 2025 model landscape. GAMA-Bench (ICLR'25): cuhk-arise.github.io/GAMABench/ 🧠 PsychoBench (ICLR'24): cuhk-arise.github.io/PsychoBench/ ❤️ EmotionBench (NeurIPS'24): cuhk-arise.github.io/EmotionBench/ 🎭 We will keep maintaining our benchmarks! We have updated our GitHub code so SOTA models can easily be tested.

English
0
1
1
112
Xing Han 韩星
Xing Han 韩星@xinghan0·
@ddvd233 审稿人实在不够了…我就bid了15篇结果给了6篇🤦🏻‍♂️
中文
1
0
1
90
dvd@dvd.chat
[email protected]@ddvd233·
NeurIPS 现在都不能选 review 数量的吗...这一上来给我 assign 五篇
dvd@dvd.chat tweet media
中文
7
0
30
3.6K
Xing Han 韩星 retweetledi
Hsing-Huan Chung
Hsing-Huan Chung@HsingHuan·
I've just arrived in Vilnius, Lithuania for @ECMLPKDD and I’m happy to present our paper, "Novel Node Category Detection Under Subpopulation Shift” Paper link: arxiv.org/abs/2404.01216
Hsing-Huan Chung tweet media
English
1
1
6
181
Xing Han 韩星 retweetledi
Xing Han 韩星 retweetledi
Alex Dimakis
Alex Dimakis@AlexGDimakis·
We have multiple postdoc openings at the AI Institute for the Foundations of Machine Learning (IFML). Fellows can work with all IFML groups in UT Austin, Univ. of Washington and Microsoft Research apply.interfolio.com/98753 (1/3)
English
1
41
145
0
Xing Han 韩星
Xing Han 韩星@xinghan0·
Very fortunate to be an intern at #intuit AI! This precious experience has greatly helped me to bridge the gap between research and deployment. Our developed model has significantly improved performance over the current solution, and will be deployed into multiple pipelines.
Xing Han 韩星 tweet media
English
0
0
0
0
Xing Han 韩星
Xing Han 韩星@xinghan0·
Hello, could you tell me how I can get my purchase receipt after I completed booking? @thetrainline
English
1
0
0
0