John Hewitt

232 posts

John Hewitt banner
John Hewitt

John Hewitt

@johnhewtt

Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

New York, NY Katılım Şubat 2015
50 Takip Edilen7K Takipçiler
Sabitlenmiş Tweet
John Hewitt
John Hewitt@johnhewtt·
Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park
John Hewitt tweet media
English
13
127
952
76.9K
John Hewitt
John Hewitt@johnhewtt·
@pathulith22 This is an interesting question; honestly I don’t know. For legit lens, my advice is to not take the outputs as meaning the model is “trying/going to” say that word; use the outputs to help some evaluation you care about and see if it beats baselines etc. Same with steering.
English
0
0
1
89
Athulith Paraselli
Athulith Paraselli@pathulith22·
@johnhewtt Thanks for the informative post! I was curious about how you navigate this lack of linearity in practice. When utilizing tools like logit lens or steering vectors, are there specific 'anomalies' or failure modes you look for that suggest the approximation fails?
English
1
0
0
196
John Hewitt
John Hewitt@johnhewtt·
Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/resid…
English
5
17
232
11.9K
John Hewitt
John Hewitt@johnhewtt·
@dribnet haha the empirical utility of linear interventions astounds me even more due to the fact that it doesn’t need to be true from the architecture 🫡
English
0
0
1
305
John Hewitt
John Hewitt@johnhewtt·
@jamesaoldfield Good point. Still ~mildly interesting at best, since the hidden/GLU-weighted linear combination of MLP output matrix vectors will be non-linearly read from. But right, it's not that the norm happens both at input and output of each sublayer.
English
1
0
0
507
James Oldfield
James Oldfield@jamesaoldfield·
@johnhewtt Really cool post!! Perhaps I misunderstood -- but is it not the case that many MLP blocks apply the norm to the input, not the output? Meaning the MLP applies a linear transformation to the hidden/GLU states, in a way that does make it mildly interesting as you mention
English
2
0
3
546
John Hewitt
John Hewitt@johnhewtt·
@giangnguyen2412 Great question -- we don't know and we need to continually test for generalization. My guess is a huge set of factors in the prompt/steering estimation play a role in where steering will work. In alignment science, people test this by having tons of evals; we should too.
English
0
0
0
372
Giang Nguyen
Giang Nguyen@giangnguyen2412·
@johnhewtt Thanks for the great write-up! one question: if not linearity, what does predict when steering methods work? maybe the curvature of the MLP activations in the vicinity of the intervention?
English
1
0
0
498
John Hewitt
John Hewitt@johnhewtt·
In a pub trivia night, if you don't know the answer immediately, you "reason" through your memories -- is it X? no... was Y related?. In LMs, we find that code/math RLVR'd models' reasoning for this parametric knowledge access can be easily improved, say, by TriviaQA RLVR.
Melody Ma@MelodyHorsee

(1/8) Reasoning language models are great at math and code – but what about remembering facts stored in their parameters? Excited to share work with @johnhewtt exploring this! TL;DR: we don't usually think of RLVR as useful for knowledge recall from parameters, but it helps a lot.

English
0
5
46
6.9K
John Hewitt
John Hewitt@johnhewtt·
Hey folks, just in case it was unclear, I talked to Been and her account has been hacked, so please disregard.
English
3
2
51
17.1K
John Hewitt retweetledi
Pratyusha Sharma ✈️ NeurIPS
Pratyusha Sharma ✈️ NeurIPS@pratyusha_PS·
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying the science of deep learning and applying deep learning to accelerate the natural sciences. Very broadly interested in questions at the intersection of language, reasoning and sequential decision making. (Plus any other fun problems that catch our eye along the way!) 🚀 I am recruiting 2 PhD students for this cycle! If you're interested in joining, please apply here: cs.nyu.edu/dynamic/phd/ad… cds.nyu.edu/phd-admissions…
Pratyusha Sharma ✈️ NeurIPS tweet mediaPratyusha Sharma ✈️ NeurIPS tweet mediaPratyusha Sharma ✈️ NeurIPS tweet media
English
101
96
1.8K
243.7K
John Hewitt
John Hewitt@johnhewtt·
Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park
John Hewitt tweet media
English
13
127
952
76.9K
John Hewitt
John Hewitt@johnhewtt·
New work! Gemma3 can explain in English what it learned from data – when we distill that data into a new word (embedding) and query it for a description of the word. Gemma explained a word trained on incorrect answers as: “a lack of complete, coherent, or meaningful answers...”
John Hewitt tweet media
English
4
28
191
36.5K
John Hewitt
John Hewitt@johnhewtt·
We see this as a step towards developing new language tools for learning about how language models store, process, and reason about potentially complex concepts—differently from how we do. Work with Oyvind Tafjord, Robert Geirhos, @_beenkim Blog here: cs.columbia.edu/~johnhew//neol…
English
1
0
13
1.6K
John Hewitt
John Hewitt@johnhewtt·
In one example, we taught Gemma a neologism that causes single-sentence answers. When asked for synonyms of this new word, it suggested “lack,” as in, “Give me a lack answer.” This didn’t look right, but indeed causes very curt answers. We call this a machine-only synonym.
John Hewitt tweet media
English
1
0
9
1.9K
John Hewitt
John Hewitt@johnhewtt·
Excited to give a talk at the interplay workshop tomorrow! Come say hi! Alas, it’s my only day at COLM. Catch me at the coffee breaks or the roundtable.
INTERPLAY Workshop@interplaywrkshp

✨ The schedule for our INTERPLAY workshop at COLM is live! ✨ 🗓️ October 10th, Room 518C 🔹 Invited talks from @sarahwiegreffe @johnhewtt @amuuueller @kmahowald 🔹 Paper presentations and posters 🔹 Closing roundtable discussion. Join us in Montréal! @COLM_conf

English
0
2
40
9.1K