John Hewitt

243 posts

John Hewitt

@johnhewtt

Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

New York, NY Katılım Şubat 2015

56 Takip Edilen7.2K Takipçiler

Sabitlenmiş Tweet

John Hewitt@johnhewtt·29 Nis

New paper! Subliminal learning—transferring hidden signals between language models—is more powerful than we thought. By biasing the teacher with a steering vector instead of a prompt, we achieve strong, consistent transfer, which we use to study its mechanisms. w/@GeorgeMorgulis

English

299

19.7K

John Hewitt@johnhewtt·10 May

@yanaiela @universeinanegg joke’s on you that was coffee #3

English

224

Yanai Elazar@yanaiela·9 May

@universeinanegg did @johnhewtt not wake up early enough to get his coffee before the talk?

English

384

Ari Holtzman@universeinanegg·8 May

John Hewitt on ‘weird communication’ with LLMs

Chenhao Tan@ChenhaoTan

Excited to announce the 2026 iteration of the Communication & Intelligence Symposium at UChicago! We have an amazing lineup of speakers @Diyi_Yang @johnhewtt @dashunwang @TomerUllman We have a simple call for abstract that is due on Apr 15 (links 👇). Please come and share your research! Co-organized with the awesome @universeinanegg and @divingwithorcas

English

4.5K

John Hewitt@johnhewtt·29 Nis

I’m most excited about how our subliminal steering enables the consistent study of subliminal learning across models and signals, and enabling future work in detection and mitigation. paper: arxiv.org/pdf/2604.25783 github: github.com/GMorgulis/Subl…

English

1.7K

John Hewitt@johnhewtt·29 Nis

This is the first paper from first author George Morgulis (@GeorgeMorgulis), a Columbia masters student. Many congratulations to him for getting this work out!

English

1.7K

John Hewitt@johnhewtt·29 Nis

English

299

19.7K

John Hewitt@johnhewtt·23 Nis

I’m at both of today’s mentoring sessions! Come with questions; leave with either answers or more questions

Yuntian Deng@yuntiandeng

ICLR office hour / mentoring sessions start today! Walk-ins welcome. ⏰ Apr 23–25 12:00–12:45 and 1:15–2:00 📍 Rooms 206, 209, 212 If you're not sure where to go, just start at Room 206.

English

3.1K

John Hewitt@johnhewtt·22 Nis

I’m at ICLR this year! Among other things, I’m happy to chat about PhD admissions; I’ll be hiring for my lab this upcoming cycle. Feel free to reach out.

English

152

14.7K

John Hewitt retweetledi

Chenhao Tan@ChenhaoTan·1 Nis

English

34.9K

John Hewitt@johnhewtt·5 Mar

@pathulith22 This is an interesting question; honestly I don’t know. For legit lens, my advice is to not take the outputs as meaning the model is “trying/going to” say that word; use the outputs to help some evaluation you care about and see if it beats baselines etc. Same with steering.

English

125

Athulith Paraselli@pathulith22·5 Mar

@johnhewtt Thanks for the informative post! I was curious about how you navigate this lack of linearity in practice. When utilizing tools like logit lens or steering vectors, are there specific 'anomalies' or failure modes you look for that suggest the approximation fails?

English

248

John Hewitt@johnhewtt·4 Mar

Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/resid…

English

235

13K

John Hewitt@johnhewtt·4 Mar

@dribnet haha the empirical utility of linear interventions astounds me even more due to the fact that it doesn’t need to be true from the architecture 🫡

English

353

tom white@dribnet·4 Mar

@johnhewtt x.com/dribnet/status…

tom white@dribnet

Weapons-grade piggy bankness: One drawing. No training. Subtract the style, get a direction in SigLIP space. Sort 50K ImageNet images by cosine similarity: 41 of the top 50 are piggy banks (P @50 = 82%). The drawing is the classifier.

QME

954

John Hewitt@johnhewtt·4 Mar

@jamesaoldfield Good point. Still ~mildly interesting at best, since the hidden/GLU-weighted linear combination of MLP output matrix vectors will be non-linearly read from. But right, it's not that the norm happens both at input and output of each sublayer.

English

548

James Oldfield@jamesaoldfield·4 Mar

@johnhewtt Really cool post!! Perhaps I misunderstood -- but is it not the case that many MLP blocks apply the norm to the input, not the output? Meaning the MLP applies a linear transformation to the hidden/GLU states, in a way that does make it mildly interesting as you mention

English

589

John Hewitt@johnhewtt·4 Mar

@giangnguyen2412 Great question -- we don't know and we need to continually test for generalization. My guess is a huge set of factors in the prompt/steering estimation play a role in where steering will work. In alignment science, people test this by having tons of evals; we should too.

English

401

Giang Nguyen@giangnguyen2412·4 Mar

@johnhewtt Thanks for the great write-up! one question: if not linearity, what does predict when steering methods work? maybe the curvature of the MLP activations in the vicinity of the intervention?

English

531

John Hewitt@johnhewtt·26 Şub

In a pub trivia night, if you don't know the answer immediately, you "reason" through your memories -- is it X? no... was Y related?. In LMs, we find that code/math RLVR'd models' reasoning for this parametric knowledge access can be easily improved, say, by TriviaQA RLVR.

Melody Ma@MelodyHorsee

(1/8) Reasoning language models are great at math and code – but what about remembering facts stored in their parameters? Excited to share work with @johnhewtt exploring this! TL;DR: we don't usually think of RLVR as useful for knowledge recall from parameters, but it helps a lot.

English

7.5K

John Hewitt@johnhewtt·19 Oca

Hey folks, just in case it was unclear, I talked to Been and her account has been hacked, so please disregard.

English

17.3K

John Hewitt@johnhewtt·9 Ara

Excited to see more funding opportunities for interpretability, with an explicit call for strong evaluations.

Martian@withmartian

$1,000,000 to understand how LLMs write code. Announcing: The Martian Interpretability Challenge. Understanding the inner workings of LLMs is the greatest scientific challenge of our age,. Let's solve it. Apply here: withmartian.com/prize 🧵👇

English

6.6K

John Hewitt retweetledi

Pratyusha Sharma@pratyusha_PS·21 Kas

📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying the science of deep learning and applying deep learning to accelerate the natural sciences. Very broadly interested in questions at the intersection of language, reasoning and sequential decision making. (Plus any other fun problems that catch our eye along the way!) 🚀 I am recruiting 2 PhD students for this cycle! If you're interested in joining, please apply here: cs.nyu.edu/dynamic/phd/ad… cds.nyu.edu/phd-admissions…

English

1.8K

244.7K

Keşfet

@yanaiela @universeinanegg @GeorgeMorgulis @Diyi_Yang @dashunwang @TomerUllman @divingwithorcas @pathulith22