John Hewitt

243 posts

John Hewitt banner
John Hewitt

John Hewitt

@johnhewtt

Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

New York, NY Katılım Şubat 2015
56 Takip Edilen7.2K Takipçiler
Sabitlenmiş Tweet
John Hewitt
John Hewitt@johnhewtt·
New paper! Subliminal learning—transferring hidden signals between language models—is more powerful than we thought. By biasing the teacher with a steering vector instead of a prompt, we achieve strong, consistent transfer, which we use to study its mechanisms. w/@GeorgeMorgulis
John Hewitt tweet media
English
6
35
299
19.7K
John Hewitt
John Hewitt@johnhewtt·
This is the first paper from first author George Morgulis (@GeorgeMorgulis), a Columbia masters student. Many congratulations to him for getting this work out!
English
1
0
10
1.7K
John Hewitt
John Hewitt@johnhewtt·
New paper! Subliminal learning—transferring hidden signals between language models—is more powerful than we thought. By biasing the teacher with a steering vector instead of a prompt, we achieve strong, consistent transfer, which we use to study its mechanisms. w/@GeorgeMorgulis
John Hewitt tweet media
English
6
35
299
19.7K
John Hewitt
John Hewitt@johnhewtt·
I’m at ICLR this year! Among other things, I’m happy to chat about PhD admissions; I’ll be hiring for my lab this upcoming cycle. Feel free to reach out.
English
1
5
152
14.7K
John Hewitt retweetledi
Chenhao Tan
Chenhao Tan@ChenhaoTan·
Excited to announce the 2026 iteration of the Communication & Intelligence Symposium at UChicago! We have an amazing lineup of speakers @Diyi_Yang @johnhewtt @dashunwang @TomerUllman We have a simple call for abstract that is due on Apr 15 (links 👇). Please come and share your research! Co-organized with the awesome @universeinanegg and @divingwithorcas
Chenhao Tan tweet media
English
2
15
77
34.9K
John Hewitt
John Hewitt@johnhewtt·
@pathulith22 This is an interesting question; honestly I don’t know. For legit lens, my advice is to not take the outputs as meaning the model is “trying/going to” say that word; use the outputs to help some evaluation you care about and see if it beats baselines etc. Same with steering.
English
0
0
1
125
Athulith Paraselli
Athulith Paraselli@pathulith22·
@johnhewtt Thanks for the informative post! I was curious about how you navigate this lack of linearity in practice. When utilizing tools like logit lens or steering vectors, are there specific 'anomalies' or failure modes you look for that suggest the approximation fails?
English
1
0
0
248
John Hewitt
John Hewitt@johnhewtt·
Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/resid…
English
5
17
235
13K
John Hewitt
John Hewitt@johnhewtt·
@dribnet haha the empirical utility of linear interventions astounds me even more due to the fact that it doesn’t need to be true from the architecture 🫡
English
0
0
1
353
John Hewitt
John Hewitt@johnhewtt·
@jamesaoldfield Good point. Still ~mildly interesting at best, since the hidden/GLU-weighted linear combination of MLP output matrix vectors will be non-linearly read from. But right, it's not that the norm happens both at input and output of each sublayer.
English
1
0
0
548
James Oldfield
James Oldfield@jamesaoldfield·
@johnhewtt Really cool post!! Perhaps I misunderstood -- but is it not the case that many MLP blocks apply the norm to the input, not the output? Meaning the MLP applies a linear transformation to the hidden/GLU states, in a way that does make it mildly interesting as you mention
English
2
0
3
589
John Hewitt
John Hewitt@johnhewtt·
@giangnguyen2412 Great question -- we don't know and we need to continually test for generalization. My guess is a huge set of factors in the prompt/steering estimation play a role in where steering will work. In alignment science, people test this by having tons of evals; we should too.
English
0
0
0
401
Giang Nguyen
Giang Nguyen@giangnguyen2412·
@johnhewtt Thanks for the great write-up! one question: if not linearity, what does predict when steering methods work? maybe the curvature of the MLP activations in the vicinity of the intervention?
English
1
0
0
531
John Hewitt
John Hewitt@johnhewtt·
In a pub trivia night, if you don't know the answer immediately, you "reason" through your memories -- is it X? no... was Y related?. In LMs, we find that code/math RLVR'd models' reasoning for this parametric knowledge access can be easily improved, say, by TriviaQA RLVR.
Melody Ma@MelodyHorsee

(1/8) Reasoning language models are great at math and code – but what about remembering facts stored in their parameters? Excited to share work with @johnhewtt exploring this! TL;DR: we don't usually think of RLVR as useful for knowledge recall from parameters, but it helps a lot.

English
0
5
48
7.5K
John Hewitt
John Hewitt@johnhewtt·
Hey folks, just in case it was unclear, I talked to Been and her account has been hacked, so please disregard.
English
3
2
51
17.3K
John Hewitt retweetledi
Pratyusha Sharma
Pratyusha Sharma@pratyusha_PS·
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying the science of deep learning and applying deep learning to accelerate the natural sciences. Very broadly interested in questions at the intersection of language, reasoning and sequential decision making. (Plus any other fun problems that catch our eye along the way!) 🚀 I am recruiting 2 PhD students for this cycle! If you're interested in joining, please apply here: cs.nyu.edu/dynamic/phd/ad… cds.nyu.edu/phd-admissions…
Pratyusha Sharma tweet mediaPratyusha Sharma tweet mediaPratyusha Sharma tweet media
English
99
94
1.8K
244.7K