Xing Han 韩星

12 posts

Xing Han 韩星

@xinghan0

Postdoctoral Fellow @JohnsHopkins | Former Research Intern @Google @salesforce @intuit @CognitiveScale | Alumnus @UTAustin @EdinburghUni

Maryland, USA Katılım Eylül 2015

185 Takip Edilen65 Takipçiler

Xing Han 韩星 retweetledi

J Huang@JentseHuang·25 Oca

We will keep maintaining our benchmarks! We have updated our GitHub code so SOTA models can easily be tested.

Zhaopeng Tu@tuzhaopeng

Can 2025 LLMs balance high IQ (Strategy) with high EQ (Personality)? 🤖🧠❤️ Thanks to the amazing @JentseHuang, we have updated our three major benchmarks -- PsychoBench, EmotionBench, and GAMA-Bench -- with the latest 2025 models (Gemini-3, GPT-5, Claude-4.5, DeepSeek-v3.2). The results reveal a shifting landscape in AI cognition and sentiment. 1⃣ Strategic Reasoning has a new king. On GAMA-Bench (ICLR'25), Claude-Sonnet-4 dominates with a score of 83.7, far outpacing GPT-4o (65.5). This suggests that models like Claude are cracking complex multi-agent cooperation and betrayal dynamics better than their peers. 2⃣ High Empathy is becoming the norm. On PsychoBench (ICLR'24), Gemini-2.5-Pro reaches an Empathy score of 6.96, significantly surpassing the human crowd baseline of 4.92. Meanwhile, safety tuning is working: Gemini-3-Pro shows near-zero Dark Triad traits (Narcissism/Machiavellianism), whereas older open models like LLaMA-3.1-70B retain "darker" profiles. 3⃣ Emotional Resilience is stabilizing. On EmotionBench (NeurIPS'24), GPT-5 displays human-like stability. Its positive mood drops significantly less in negative contexts (-8.2) compared to GPT-4 (-27.6), while DeepSeek-V3.2 demonstrates sharper, more appropriate negative empathy responses. 4⃣ Different families, different strengths. While Claude leads in pure game-theoretic reasoning (GAMA), the Gemini and GPT-5 families excel in psychological safety and emotional stability. The gap between "Thinking" and "Feeling" capabilities is defining the 2025 model landscape. GAMA-Bench (ICLR'25): cuhk-arise.github.io/GAMABench/ 🧠 PsychoBench (ICLR'24): cuhk-arise.github.io/PsychoBench/ ❤️ EmotionBench (NeurIPS'24): cuhk-arise.github.io/EmotionBench/ 🎭 We will keep maintaining our benchmarks! We have updated our GitHub code so SOTA models can easily be tested.

English

112

Xing Han 韩星@xinghan0·2 Haz

@ddvd233 审稿人实在不够了…我就bid了15篇结果给了6篇🤦🏻‍♂️

中文

[email protected]@ddvd233·31 May

NeurIPS 现在都不能选 review 数量的吗...这一上来给我 assign 五篇

中文

3.6K

Xing Han 韩星 retweetledi

Drew Prinster@DrewPrinster·13 May

AI monitoring is key to responsible deployment. Our #ICML2025 paper develops approaches for 3 main goals: 1) *Adapting* to mild data shifts 2) *Quickly Detecting* harmful shifts 3) *Diagnosing* cause of degradation 🧵w/ @xinghan0 @anqi_liu33 @suchisaria arxiv.org/abs/2505.04608

English

6.1K

Xing Han 韩星 retweetledi

Hsing-Huan Chung@HsingHuan·9 Eyl

I've just arrived in Vilnius, Lithuania for @ECMLPKDD and I’m happy to present our paper, "Novel Node Category Detection Under Subpopulation Shift” Paper link: arxiv.org/abs/2404.01216

English

181

Xing Han 韩星 retweetledi

Drew Prinster@DrewPrinster·13 May

Had a (positive) “deer-in-the-headlights” moment today when scrolled to these kind & generous words from @predict_addict on our new paper w/ @samuel_stanton_ @anqi_liu33 @suchisaria! 😆arxiv.org/abs/2405.06627 Humbled & grateful to share this work. Full paper summary thread soon!

Valeriy M., PhD, MBA, CQF@predict_addict

This is a monumental breakthrough in the realm of Conformal Prediction, akin to discovering another universe in physics! 🔥🔥🔥🔥🔥 A groundbreaking new paper from John Hopkins University and Genentech titled "Conformal Validity Guarantees Exist for Any Data Distribution" by Drew Pinster, Samuel Stanton, Anqi (Angie) Liu, Suchi Saria has proven that conformal prediction guarantees are applicable to ANY data distribution, not just exchangeable ones. 🚀🚀🚀🚀🚀 This fundamental advancement paves the way for entirely new areas of research and practical applications. #conformalprediction

English

4.6K

Xing Han 韩星 retweetledi

Xingchao Liu@XingchaoL·26 Eyl

We're thrilled to release our Hugging Face demo for the one-step InstaFlow-0.9B! Experience the magic of instant text-to-image generation here: huggingface.co/spaces/XCLiu/I… arXiv Paper: arxiv.org/abs/2309.06380 Project Page: github.com/gnobitab/Insta…

GIF

English

241

75.1K

Xing Han 韩星 retweetledi

Suchi Saria@suchisaria·29 Nis

Wow, what’s not to love — the GREAT CATSPY! 🐈‍⬛ 🍾🪩 Made entirely by #AI text-to-video!

Nathaniel Whittemore@nlw

This entire video was made by a guy typing some words into a box (a bunch of times). The degree to which that this will unleash human creativity is stunning.

English

4.8K

Xing Han 韩星 retweetledi

Alex Dimakis@AlexGDimakis·11 Kas

We have multiple postdoc openings at the AI Institute for the Foundations of Machine Learning (IFML). Fellows can work with all IFML groups in UT Austin, Univ. of Washington and Microsoft Research apply.interfolio.com/98753 (1/3)

English

145

Xing Han 韩星 retweetledi

Alex Dimakis@AlexGDimakis·6 Kas

Oh no

Oded Rechavi@OdedRechavi

PhD student finally graduates after 7 years

English

Xing Han 韩星@xinghan0·6 Eyl

Very fortunate to be an intern at #intuit AI! This precious experience has greatly helped me to bridge the gap between research and deployment. Our developed model has significantly improved performance over the current solution, and will be deployed into multiple pipelines.

English

Xing Han 韩星@xinghan0·6 Oca

@thetrainline Yes but I don't find any receipt attachment in the email

English

Xing Han 韩星@xinghan0·6 Oca

Hello, could you tell me how I can get my purchase receipt after I completed booking? @thetrainline

English

Keşfet

@ddvd233 @anqi_liu33 @suchisaria @ECMLPKDD @predict_addict @samuel_stanton_ @thetrainline @elonmusk