Jane Pan

27 posts

Jane Pan

Jane Pan

@JanePan_

CS PhD at @nyuniversity, @NSF GRFP, @Deepmind Fellowship, @SiebelScholars | @Princeton @Princeton_nlp '23 | @Columbia '21.

Katılım Şubat 2023
145 Takip Edilen440 Takipçiler
Jane Pan retweetledi
John (Yueh-Han) Chen
John (Yueh-Han) Chen@jcyhc_ai·
Can LLMs control their chains of thought (CoT)? If so, they could evade CoT monitors 🚨 We introduce the CoT Controllability eval suite to find out. Our results leave us cautiously optimistic that today’s models struggle to obfuscate their CoT in ways that undermine monitorability. In this thread, I explain additional findings that I find interesting Joint work w @OpenAI
Tomek Korbak@tomekkorbak

We have a new eval to help keep chains of thought (CoT) monitorable: CoT Controllability. This tests whether LLMs can control their CoT, helping to evade CoT monitors. So far, the results leave us cautiously optimistic: today’s models struggle to obfuscate their reasoning in ways that undermine monitorability.

English
3
11
70
7.7K
Jane Pan retweetledi
Vishakh Padmakumar
Vishakh Padmakumar@vishakh_pk·
Our work on LLM novelty as the frontier of original and high-quality output was accepted to #ICLR26! Come talk to us about how model scale, SFT, and RL affect this trade-off! See you in Brazil!🇧🇷h/t to my awesome collaborators @hhexiy @valeriechen_ @JanePan_ @jcyhc_ai
Vishakh Padmakumar tweet media
Vishakh Padmakumar@vishakh_pk

What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

English
1
6
44
4.7K
Jane Pan retweetledi
Richard Pang
Richard Pang@yzpang_·
🚨Prompt Curriculum Learning (PCL) - Efficient LLM RL training algo! - We investigate factors that affect convergence: bsz, # prompt, # gen, prompt selection - We propose PCL: lightweight algo that *dynamically selects intermediate-difficulty prompts* using a learned value model
Richard Pang tweet media
English
2
34
170
24.1K
Jane Pan
Jane Pan@JanePan_·
Bored of seeing pristine, perfect posters? Come see me at Hall X5, Board 105 at 6pm to witness my masterpiece, featuring bonus Sharpie scribbles and a QR code that betrayed me at the last moment 😤
Jane Pan@JanePan_

I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!

English
0
2
24
3.4K
Jane Pan
Jane Pan@JanePan_·
I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!
Jane Pan@JanePan_

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]

English
1
3
52
10.4K
Jane Pan retweetledi
Vishakh Padmakumar
Vishakh Padmakumar@vishakh_pk·
What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
Vishakh Padmakumar tweet media
English
2
22
87
11K
Jane Pan retweetledi
Yulin Chen
Yulin Chen@YulinChen99·
We're excited to receive wide attention from the community—thank you for your support! We release code, trained probes, and the generated CoT data👇 github.com/AngelaZZZ-611/… We have labeled answer data on its way. Stay tuned!
Yulin Chen@YulinChen99

Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below

English
1
10
43
5.3K
Jane Pan
Jane Pan@JanePan_·
Our work bridges the gap between existing static benchmarks and real-world usage, and we hope to inspire future work on scalable methods for evaluating models in a collaborative setting. Read our preprint at arxiv.org/abs/2502.18413! [6/6]
Jane Pan tweet media
English
0
1
8
472
Jane Pan
Jane Pan@JanePan_·
We also investigate how much a code model adjusts its solution in response to feedback. Weaker models tend to make many surface-level changes that do not greatly change code behavior; stronger models may make relatively small edits that highly affect code behavior. [5/6]
Jane Pan tweet media
English
1
0
6
459
Jane Pan
Jane Pan@JanePan_·
When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]
Jane Pan tweet media
English
2
13
54
10.5K
Jane Pan
Jane Pan@JanePan_·
We follow the canonical definition of reward hacking, observing a divergence between the ground-truth reward (human expert judgment) and its proxy (an LLM judge following the same scoring criteria as the humans). Our results complement recent work on output degradation via iterative refinement when measured with secondary objectives (arxiv.org/abs/2402.06627) or with reference-based metrics (arxiv.org/abs/2402.11436). [6/7]
English
1
0
7
1.3K
Jane Pan
Jane Pan@JanePan_·
Do LLMs exploit imperfect proxies of human preference in context? Yes! In fact, they do it so severely that iterative refinement can make outputs worse when judged by actual humans. In other words, reward hacking can occur even without gradient updates! w/ @hhexiy, @sleepinyourhat, @ihsgnef [1/7]
Jane Pan tweet media
English
4
28
170
22.1K