Gabriel Franco (@gvsfranco) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Why do attention heads attend where they do? We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs. New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"

English

1

4

12

2.1K

Gabriel Franco retweetledi

Jessica Hullman@JessicaHullman·10 Nis

“In my lab, you will have the luxury to think things through” is turning out to be pretty effective as a recruitment line for AI/ML PhD candidates. I guess permission to be careful is a scarce resource.

English

6

23

264

59.8K

Gabriel Franco@gvsfranco·9 Şub

@a_jy_l @boknilev @viegasf @wattenberg Very interesting work!

English

0

1

62

Andrew Lee@a_jy_l·9 Şub

With amazing collaborators @boknilev @viegasf @wattenberg Paper: arxiv.org/pdf/2602.04752… Code: github.com/ajyl/QK Hope you enjoy! 6/N

English

1

11

505

Andrew Lee@a_jy_l·9 Şub

😻New preprint! As an interp researcher, I often ask “why did the model attend to this token?” We study this by decomposing the query-key (QK) space into interpretable low-rank subspaces. When these subspaces of Qs and Ks align, the model produces high attention scores. 1/N

English

4

19

134

7K

Gabriel Franco retweetledi

Zhengyang Shan@ZhengyangShan·20 Oca

Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our @eaclmeeting paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.

English

6

3

19

2.1K

Gabriel Franco@gvsfranco·10 Ara

Such an amazing talk as always, David. Let's keep doing good long-term science!

David Bau@davidbau

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: davidbau.com/archives/2025/…

English

0

3

177

Gabriel Franco@gvsfranco·6 Ara

@Aman56Champ Thank you :)

English

0

29

Aman@Aman56Champ·6 Ara

@gvsfranco Cool work !

English

1

0

1

36

Gabriel Franco@gvsfranco·5 Ara

Happening today at #NeurIPS2025: How do you find circuits in LLMs without counterfactuals, patching, or SAEs? I’m presenting our new method for doing this by understanding the causes of attention. Exhibit Hall C,D,E, Poster #1111 4:30 PM - 7:30 PM

Gabriel Franco@gvsfranco

Why do attention heads attend where they do? We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs. New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"

English

2

12

1.2K

Gabriel Franco@gvsfranco·26 Kas

@HeMuyu0327 This is great work! We've been studying the exact same phenomenon from a causal perspective of attention. We found low dimensional signals that are causal to attention sinks. This is our NeurIPS paper if you want to check it out: openreview.net/pdf?id=wUoK24u… x.com/gvsfranco/stat…

Gabriel Franco@gvsfranco

We also discovered "control signals", which are data-independent signals that coordinate attention across layers. They implement attention sinks at the signal level. Most models use a few distinct control signals, which is a way used by the model to organize heads hierarchically!

English

1

0

1

50

Muyu He@HeMuyu0327·25 Kas

We found that K vectors of attention sinks shrink to a low-variance subspace and exposes a "bias direction" for pick-up, but where did they get this bias direction from? We now find that they inherit this bias direction directly from last layer's residual output. If we trace one step back to the layer-normed input before the attention layer for block > 6, we find that the input for the sink token 0 is again extremely low-variance, while the input for token 1 and 8 are high-variance as usual (p1). Since the inputs to the attention block for token 0 are low-variance, it is no wonder that the K vectors for token 0 will be low-variance. And if the inputs again have a bias direction in the higher-dimension hidden space, then K vectors can easily have a bias direction in the lower-dimension head space. And they do. We further trace one step back to the output of last layer before the layernorm, and we find that the variance of token 0 outputs explode (p2). This is actually good: it suggests that the last layer's output shares the same bias direction but have different magnitudes, which layernorm removes in the next layer. So the word: bias direction is directly created in the residual output, which neatly translates into next layer's K, which forms the sink. Since the output of last layer is a residual addition between the attention output and the MLP / MoE output, it's now time to see what each part brings to the formation of attention sinks. Something cool is going to happen...

English

5

7

70

3.5K

Gabriel Franco retweetledi

Najoung Kim 🫠@najoungkim·19 Kas

My lab at BU is recruiting PhD students and possibly a postdoc this year! We study humans & machines, centered around topics like meaning, generalization, evaluation methods and design, and the nature of computation and representation that underlie language and cognition. 🫴🫴

English

5

69

248

27K

Gabriel Franco retweetledi

Yukyung Lee@yukyunglee_·5 Kas

How reliable is your LLM-as-a-Judge?⚖️ Existing methods suffer from (a) rating inconsistencies and (b) low stability of correlation with human judgments across evaluator models. Excited to share CheckEval (@emnlpmeeting), a framework that improves reliability for LLM-as-a-Judge.

English

1

10

37

7.5K

Gabriel Franco@gvsfranco·4 Kas

📄 Paper: openreview.net/pdf?id=wUoK24u… 💻 Code: github.com/gaabrielfranco… See you in San Diego! #NeurIPS2025 #AI #ML #NLP #Interpretability #Transformers #Attention

English

0

1

131

Gabriel Franco@gvsfranco·4 Kas

We also discovered "control signals", which are data-independent signals that coordinate attention across layers. They implement attention sinks at the signal level. Most models use a few distinct control signals, which is a way used by the model to organize heads hierarchically!

English

1

0

148

Gabriel Franco@gvsfranco·4 Kas

Why do attention heads attend where they do? We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs. New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"

English

1

4

12

2.1K

Gabriel Franco retweetledi

Zilu Tang (Peter)@Zilu_Tang_Peter·30 Eyl

When aligning LMs to personal preferences, is it at all beneficial to have model infer user preferences ("persona") over just using few-shots historical contexts? In our paper we found inferred "personas" improve generalization💪, contextual faithfulness🙏, and reduces bias⚖️!

English

1

4

1K

Gabriel Franco retweetledi

David Bau@davidbau·23 Ağu

Thanks to all for making NEMI 2025 a wonderful event. Fascinating talks, inspiring posters, important discussions. You surfaced the questions animating our growing field. I learned many things and hope you did too! Looking forward to what the next year will bring.

English

2

15

98

4.8K

Gabriel Franco retweetledi

Yulu Qin@yulu_qin·21 Tem

Does vision training change how language is represented and used in meaningful ways?🤔 The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]

English

3

27

108

24.8K

Gabriel Franco retweetledi

METR@METR_Evals·10 Tem

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

English

248

1.3K

6.7K

3.7M

Gabriel Franco retweetledi

Yukyung Lee@yukyunglee_·2 Tem

Can coding agents autonomously implement AI research extensions? We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code. Finding: Most agents we tested had a low success rate, but there is promise!

English

1

33

134

23.7K

Gabriel Franco

Keşfet