Nick Jiang (@nickhjiang) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Nick Jiang@nickhjiang·16 Ara

New work! What if we used sparse autoencoders to analyze data, not models—where SAE latents act as a large set of data labels 🏷️? We find that SAEs beat baselines on 4 data analysis tasks and uncover surprising, qualitative insights about models (e.g. Grok-4, OpenAI) from data.

English

13

36

248

75.8K

Nick Jiang@nickhjiang·27 Mar

@etash_guha congrats!!

English

1

0

55

Etash Guha@etash_guha·26 Mar

Career Update: I’m joining Anthropic on the pretraining team! Excited to learn from all the brilliant and creative people there. Let’s go train some models!

English

69

7

734

34.5K

Nick Jiang retweetledi

Anthropic@AnthropicAI·27 Şub

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

4.3K

9.4K

55.9K

16.5M

Nick Jiang@nickhjiang·19 Şub

@atticuswzf Reward models are kinda funny

English

0

1

105

Atticus Wang@atticuswzf·18 Şub

Is "a response formatted like this" sometimes better than "a response formatted like this"? To a reward model, yes! RMs are instrumental in shaping model behaviors and alignment. Our paper makes progress uncovering their unexpected preferences. 🧵(1/9)

English

8

12

92

14K

Nick Jiang retweetledi

Neil Rathi@neil_rathi·30 Oca

New paper, w/@AlecRad Models acquire a lot of capabilities during pretraining. We show that we can precisely shape what they learn simply by filtering their training data at the token level.

English

27

98

1.1K

105K

Nick Jiang@nickhjiang·13 Oca

@ycombinator @gru_space lets cook @skyler_chan_

English

0

2

227

Y Combinator@ycombinator·12 Oca

🌕 @gru_space is building durable space habitats so humans can one day live on the Moon and Mars. Its first missions will mine lunar regolith to construct a long-term pressurized habitat on the Moon for commercial space tourism — a hotel on the Moon. Congrats on the launch @skyler_chan_! ycombinator.com/launches/P9g-g…

English

112

97

619

131.8K

Nick Jiang retweetledi

Jacob Steinhardt@JacobSteinhardt·17 Ara

I'm really proud of what our team at @TransluceAI has accomplished in the last year! Take a moment to read our end-of-year post to learn what we're up to, and please reach out if you're interested in supporting us!

Transluce@TransluceAI

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

English

1

7

65

8.6K

Nick Jiang retweetledi

Neel Nanda@NeelNanda5·16 Ara

I'm really excited about this paper! It's an example of data-centric interpretability, which IMO is a really impactful new area: models have tons of relevant data, what can we learn by analysing it? Turns out there's a lot you can do if you're creative! eg SAEs on closed models

Nick Jiang@nickhjiang

New work! What if we used sparse autoencoders to analyze data, not models—where SAE latents act as a large set of data labels 🏷️? We find that SAEs beat baselines on 4 data analysis tasks and uncover surprising, qualitative insights about models (e.g. Grok-4, OpenAI) from data.

English

7

13

178

21.6K

Nick Jiang@nickhjiang·16 Ara

@TheGrizztronic No, the embeddings are reusable. You can view the reader model + SAE as just a bigger embedding model.

English

0

1

21

Josh Cason@TheGrizztronic·16 Ara

@nickhjiang Does this mean the docs need to be passed back through the reader for each query?

English

1

0

16

Nick Jiang retweetledi

Nick Jiang@nickhjiang·16 Ara

New work! What if we used sparse autoencoders to analyze data, not models—where SAE latents act as a large set of data labels 🏷️? We find that SAEs beat baselines on 4 data analysis tasks and uncover surprising, qualitative insights about models (e.g. Grok-4, OpenAI) from data.

English

13

36

248

75.8K

Nick Jiang@nickhjiang·16 Ara

@xianjun_agi Thanks for sharing, will take a look!

English

0

1

114

Xianjun Yang@xianjun_agi·16 Ara

Cool! "What if we used sparse autoencoders to analyze data, not models?" We also have a paper using SAEs to analyze data earlier this year: arxiv.org/abs/2502.14050 This shows interpretability is useful for downstream tasks.

Nick Jiang@nickhjiang

New work! What if we used sparse autoencoders to analyze data, not models—where SAE latents act as a large set of data labels 🏷️? We find that SAEs beat baselines on 4 data analysis tasks and uncover surprising, qualitative insights about models (e.g. Grok-4, OpenAI) from data.

English

2

30

5.3K

Nick Jiang@nickhjiang·16 Ara

Yup! An easy extension could be finding what qualities have been decreasing across models, for example. We also chose frequency across documents as our metric for diffing experiments, but it wouldn't be too hard to pick something else (e.g. frequency within each doc if the docs are long)

English

0

1

432

Theodore Galanos@TheodoreGalanos·16 Ara

@nickhjiang This is beautiful! I can think of a variation to this in order to assess and understand task performance across models?

English

1

0

523

Nick Jiang@nickhjiang·16 Ara

@floringham We sampled 1000 prompts from Chatbot Arena when generating the responses, so it probably wouldn't change the results much. I think the larger concern would be that chatbot arena isn't representative of real user prompts (unfortunately, we don't have access to these).

English

1

0

237

Inaya@floringham·16 Ara

@nickhjiang interesting work! in the Case study 1, I wonder if you try slightly different wordings for the prompt, does it change the models behaviour much?

English

1

0

348

Nick Jiang@nickhjiang·16 Ara

@dosdesvios You could, but LDA and topic modeling tend to give broad semantic topics. SAE latents tend to be more granular and property-like (there are also more of them). We compared SAEs with CTMs in our correlations task and also found that CTMs were noisier.

English

0

1

57

Dos desvíos@dosdesvios·16 Ara

@nickhjiang Thx for ur answer! For that purpose, I could use LDA or any other topic modeling technique, can't I?

English

1

0

1

79

Nick Jiang@nickhjiang·16 Ara

@dosdesvios Great question! The advantage of these labels is that you don't need to pre-define them, meaning that you can find insights about your data without any priors.

English

1

0

2

606

Dos desvíos@dosdesvios·16 Ara

@nickhjiang Cool work! One question: Why SAE labels would be more interesting than any other type of label that I could come up with?

English

1

0

1

679

Nick Jiang retweetledi

Lisa Dunlap@lisabdunlap·15 Ara

🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com