He He

149 posts

He He

@hhexiy

NLP researcher. Assistant Professor at NYU CS & CDS.

Katılım Aralık 2016

418 Takip Edilen7.9K Takipçiler

Sabitlenmiş Tweet

He He@hhexiy·2d

x.com/i/article/2036…

ZXX

121

837

107.8K

He He retweetledi

CLS@ChengleiSi·2d

Love seeing this. There are two flavors of automated AI research. One that just cares about hill climbing on a target benchmark (e.g., autoresearch). The other that actually care about the idea (e.g., this post). Beyond just looking at the performance, we want to ask: does the idea offer some new insights? Is it simple and scalable? Is it generalizable? The evaluation is a lot harder for this case, because it requires the human judge to have a good research taste in the first place, but it’s also gonna be much more fun :)

He He@hhexiy

x.com/i/article/2036…

English

9.7K

He He@hhexiy·24 Şub

@DimitrisPapail @ChenhaoTan Impressive! Do you run claude on the remote directly or it needs to frequently ssh into the machine?

English

Dimitris Papailiopoulos@DimitrisPapail·22 Şub

@ChenhaoTan Pretty simple: Claude code (in vs code) + remote GPUs (ms sandbox or lambda)

English

1.8K

Dimitris Papailiopoulos@DimitrisPapail·22 Şub

Tenth night in a row that Claude code is running experiments for me overnight…

English

394

101.6K

He He@hhexiy·18 Şub

@sivareddyg congrats!!

English

199

Siva Reddy@sivareddyg·18 Şub

Honored to be a Sloan Fellow. So grateful to my wonderful students, mentors, colleagues, friends and family, thank you! ❤️

Sloan Foundation@SloanFoundation

Congrats to the 126 early-career scholars awarded a 2026 Sloan Research Fellowship, whose creativity and innovation set them apart as the next generation of scientific leaders! Our Fellows represent 7 fields and 44 institutions across the US and Canada. sloan.org/fellowships/20…

English

111

13.3K

He He retweetledi

Xinpeng Wang@XinpengWang_·6 Şub

🥳Accepted as ICLR 2026 Oral! Check it out if you are interested in CoT Monitoring, Reward Hacking, and Loophole Discovery! arxiv.org/pdf/2510.01367 Joint work with @nitishjoshi23, Barbara Plank @rico_angell, @hhexiy

Xinpeng Wang@XinpengWang_

‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual task. 🧵

English

190

25.6K

He He retweetledi

jianlin.su@Jianlin_S·19 Oca

An English mirror site of Scientific Spaces: rohin-garg.github.io/kexue-en/index… Created by @GargRohin3301 . Thanks a lot!

English

157

13.7K

He He retweetledi

dr. jack morris@jxmnop·27 Eki

very cool post quick reminder everyone doing online distillation is really reimplementing DAGGER, a paper published in 2011 that tested everything on linear SVMs this is one inspiring feature of pure research: you never really know when your ideas will start to matter

Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English

345

44.8K

He He@hhexiy·24 Eki

@haizelabs is one of the few truly tackling the hard problem of LLM eval and oversight. Excited to support their mission!

Leonard Tang@leonardtang_

We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, human–AI collaboration, and reasoning.

English

9.1K

He He retweetledi

Manos Koukoumidis@Koukoumidis·9 Eki

New blog post: Hours, Not Months – The Custom AI Era is Now: open.substack.com/pub/oumiai/p/h… Oumi website: oumi.ai

English

2.2K

He He@hhexiy·14 Eki

Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell👇

Xinpeng Wang@XinpengWang_

English

132

24.1K

He He retweetledi

Nitish Joshi@nitishjoshi23·7 Eki

Monitoring CoT may be insufficient to detect reward hacking. We develop a very simple method to detect such implicit reward hacking - truncate CoT, force predict answer, and use the AUC of the %CoT vs expected reward curve as a measure. Last project of my PhD!

Xinpeng Wang@XinpengWang_

English

4.8K

He He retweetledi

Andrej Karpathy@karpathy·9 Eki

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.

English

293

347

7.2K

712K

He He@hhexiy·8 Eki

Come to Nick's poster if you're at #COLM2025 and learn about how to run LLM experiments the scientific way!

Nicholas Lourie@NickLourie

LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9

English

9.1K

He He retweetledi

Nicholas Lourie@NickLourie·8 Eki

GIF

English

14.1K

He He retweetledi

Sasha Rush@srush_nlp·2 Eyl

How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models.

English

144

22.6K

He He@hhexiy·29 Ağu

@jxmnop Most people can't perceive that subtlety beyond a certain level.

English

629

dr. jack morris@jxmnop·29 Ağu

if i ran a first-party model company i'd hire hundreds of humanities folks to make subtle data edits to improve model 'feel' someone needs to be that deep in the RLHF data. agonizing over every verb choice, every exclamation, every semicolon

English

464

68.1K

He He retweetledi

Greg Durrett@gregd_nlp·11 Ağu

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please reach out if you're interested in chatting! This move comes after 8 years working with incredible students and collaborators at UT Austin. Thank you to everyone who supported me in my first academic appointment; I look forward to continuing our collaborations but I will miss you! (and the breakfast tacos!)

English

761

65K

He He retweetledi

Maksym Andriushchenko@maksym_andr·6 Ağu

🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 (i.e., as soon as possible) and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns. Fill the Google form below if you're interested! Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks. We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks. For more information about research topics relevant to our group, please check the following documents: - International AI Safety Report, - An Approach to Technical AGI Safety and Security by DeepMind, - Open Philanthropy’s 2025 RFP for Technical AI Safety Research. Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts—literally anything that can be genuinely useful for other researchers and the general public. Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective". Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us. We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time. Join us on this journey!

English

844

105.6K

He He retweetledi

Kaiyu Yang@KaiyuYang4·23 Tem

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

English

238

38.2K

He He retweetledi

Jane Pan@JanePan_·24 Tem

I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!

Jane Pan@JanePan_

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]

English

10.4K

Keşfet

@DimitrisPapail @ChenhaoTan @sivareddyg @nitishjoshi23 @rico_angell @GargRohin3301 @haizelabs @XinpengWang_