He He

149 posts

He He

He He

@hhexiy

NLP researcher. Assistant Professor at NYU CS & CDS.

Katılım Aralık 2016
418 Takip Edilen7.9K Takipçiler
He He retweetledi
CLS
CLS@ChengleiSi·
Love seeing this. There are two flavors of automated AI research. One that just cares about hill climbing on a target benchmark (e.g., autoresearch). The other that actually care about the idea (e.g., this post). Beyond just looking at the performance, we want to ask: does the idea offer some new insights? Is it simple and scalable? Is it generalizable? The evaluation is a lot harder for this case, because it requires the human judge to have a good research taste in the first place, but it’s also gonna be much more fun :)
He He@hhexiy

x.com/i/article/2036…

English
1
6
61
9.7K
He He
He He@hhexiy·
@DimitrisPapail @ChenhaoTan Impressive! Do you run claude on the remote directly or it needs to frequently ssh into the machine?
English
0
0
0
87
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Tenth night in a row that Claude code is running experiments for me overnight…
English
19
6
394
101.6K
He He retweetledi
He He retweetledi
dr. jack morris
dr. jack morris@jxmnop·
very cool post quick reminder everyone doing online distillation is really reimplementing DAGGER, a paper published in 2011 that tested everything on linear SVMs this is one inspiring feature of pure research: you never really know when your ideas will start to matter
dr. jack morris tweet media
Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English
12
25
345
44.8K
He He
He He@hhexiy·
Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell👇
Xinpeng Wang@XinpengWang_

‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual task. 🧵

English
4
11
132
24.1K
He He retweetledi
Nitish Joshi
Nitish Joshi@nitishjoshi23·
Monitoring CoT may be insufficient to detect reward hacking. We develop a very simple method to detect such implicit reward hacking - truncate CoT, force predict answer, and use the AUC of the %CoT vs expected reward curve as a measure. Last project of my PhD!
Xinpeng Wang@XinpengWang_

‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual task. 🧵

English
0
4
18
4.8K
He He retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.
English
293
347
7.2K
712K
He He retweetledi
Nicholas Lourie
Nicholas Lourie@NickLourie·
LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9
GIF
English
2
10
32
14.1K
He He retweetledi
Sasha Rush
Sasha Rush@srush_nlp·
How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models.
Sasha Rush tweet media
English
2
20
144
22.6K
He He
He He@hhexiy·
@jxmnop Most people can't perceive that subtlety beyond a certain level.
English
0
0
5
629
dr. jack morris
dr. jack morris@jxmnop·
if i ran a first-party model company i'd hire hundreds of humanities folks to make subtle data edits to improve model 'feel' someone needs to be that deep in the RLHF data. agonizing over every verb choice, every exclamation, every semicolon
English
40
13
464
68.1K
He He retweetledi
Greg Durrett
Greg Durrett@gregd_nlp·
📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please reach out if you're interested in chatting! This move comes after 8 years working with incredible students and collaborators at UT Austin. Thank you to everyone who supported me in my first academic appointment; I look forward to continuing our collaborations but I will miss you! (and the breakfast tacos!)
Greg Durrett tweet mediaGreg Durrett tweet media
English
93
44
761
65K
He He retweetledi
Maksym Andriushchenko
Maksym Andriushchenko@maksym_andr·
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 (i.e., as soon as possible) and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns. Fill the Google form below if you're interested! Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks. We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks. For more information about research topics relevant to our group, please check the following documents: - International AI Safety Report, - An Approach to Technical AGI Safety and Security by DeepMind, - Open Philanthropy’s 2025 RFP for Technical AI Safety Research. Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts—literally anything that can be genuinely useful for other researchers and the general public. Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective". Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us. We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time. Join us on this journey!
Maksym Andriushchenko tweet media
English
76
88
844
105.6K
He He retweetledi
Kaiyu Yang
Kaiyu Yang@KaiyuYang4·
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
Kaiyu Yang tweet media
English
7
49
238
38.2K
He He retweetledi
Jane Pan
Jane Pan@JanePan_·
I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!
Jane Pan@JanePan_

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]

English
1
3
52
10.4K