Gabriel Franco

49 posts

Gabriel Franco

Gabriel Franco

@gvsfranco

CS PhD student @BUCompSci. Interested in interpretability.

Boston, MA Katılım Mayıs 2023
166 Takip Edilen57 Takipçiler
Sabitlenmiş Tweet
Gabriel Franco
Gabriel Franco@gvsfranco·
Why do attention heads attend where they do? We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs. New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"
Gabriel Franco tweet media
English
1
4
12
2.1K
Gabriel Franco retweetledi
Jessica Hullman
Jessica Hullman@JessicaHullman·
“In my lab, you will have the luxury to think things through” is turning out to be pretty effective as a recruitment line for AI/ML PhD candidates. I guess permission to be careful is a scarce resource.
English
6
23
264
59.8K
Andrew Lee
Andrew Lee@a_jy_l·
😻New preprint! As an interp researcher, I often ask “why did the model attend to this token?” We study this by decomposing the query-key (QK) space into interpretable low-rank subspaces. When these subspaces of Qs and Ks align, the model produces high attention scores. 1/N
Andrew Lee tweet media
English
4
19
134
7K
Gabriel Franco retweetledi
Zhengyang Shan
Zhengyang Shan@ZhengyangShan·
Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our @eaclmeeting paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.
Zhengyang Shan tweet media
English
6
3
19
2.1K
Gabriel Franco
Gabriel Franco@gvsfranco·
Happening today at #NeurIPS2025: How do you find circuits in LLMs without counterfactuals, patching, or SAEs? I’m presenting our new method for doing this by understanding the causes of attention. Exhibit Hall C,D,E, Poster #1111 4:30 PM - 7:30 PM
Gabriel Franco tweet media
Gabriel Franco@gvsfranco

Why do attention heads attend where they do? We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs. New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"

English
2
2
12
1.2K
Gabriel Franco
Gabriel Franco@gvsfranco·
@HeMuyu0327 This is great work! We've been studying the exact same phenomenon from a causal perspective of attention. We found low dimensional signals that are causal to attention sinks. This is our NeurIPS paper if you want to check it out: openreview.net/pdf?id=wUoK24u… x.com/gvsfranco/stat…
Gabriel Franco@gvsfranco

We also discovered "control signals", which are data-independent signals that coordinate attention across layers. They implement attention sinks at the signal level. Most models use a few distinct control signals, which is a way used by the model to organize heads hierarchically!

English
1
0
1
50
Muyu He
Muyu He@HeMuyu0327·
We found that K vectors of attention sinks shrink to a low-variance subspace and exposes a "bias direction" for pick-up, but where did they get this bias direction from? We now find that they inherit this bias direction directly from last layer's residual output. If we trace one step back to the layer-normed input before the attention layer for block > 6, we find that the input for the sink token 0 is again extremely low-variance, while the input for token 1 and 8 are high-variance as usual (p1). Since the inputs to the attention block for token 0 are low-variance, it is no wonder that the K vectors for token 0 will be low-variance. And if the inputs again have a bias direction in the higher-dimension hidden space, then K vectors can easily have a bias direction in the lower-dimension head space. And they do. We further trace one step back to the output of last layer before the layernorm, and we find that the variance of token 0 outputs explode (p2). This is actually good: it suggests that the last layer's output shares the same bias direction but have different magnitudes, which layernorm removes in the next layer. So the word: bias direction is directly created in the residual output, which neatly translates into next layer's K, which forms the sink. Since the output of last layer is a residual addition between the attention output and the MLP / MoE output, it's now time to see what each part brings to the formation of attention sinks. Something cool is going to happen...
Muyu He tweet mediaMuyu He tweet media
English
5
7
70
3.5K
Gabriel Franco retweetledi
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
My lab at BU is recruiting PhD students and possibly a postdoc this year! We study humans & machines, centered around topics like meaning, generalization, evaluation methods and design, and the nature of computation and representation that underlie language and cognition. 🫴🫴
Najoung Kim 🫠 tweet media
English
5
69
248
27K
Gabriel Franco retweetledi
Yukyung Lee
Yukyung Lee@yukyunglee_·
How reliable is your LLM-as-a-Judge?⚖️ Existing methods suffer from (a) rating inconsistencies and (b) low stability of correlation with human judgments across evaluator models. Excited to share CheckEval (@emnlpmeeting), a framework that improves reliability for LLM-as-a-Judge.
Yukyung Lee tweet media
English
1
10
37
7.5K
Gabriel Franco
Gabriel Franco@gvsfranco·
We also discovered "control signals", which are data-independent signals that coordinate attention across layers. They implement attention sinks at the signal level. Most models use a few distinct control signals, which is a way used by the model to organize heads hierarchically!
Gabriel Franco tweet media
English
1
0
0
148
Gabriel Franco
Gabriel Franco@gvsfranco·
Why do attention heads attend where they do? We can now pinpoint the EXACT features causing attention—without counterfactuals, patching, or SAEs. New @NeurIPSConf 2025 paper with @mcrovella: "Pinpointing Attention-Causal Communication in Language Models"
Gabriel Franco tweet media
English
1
4
12
2.1K
Gabriel Franco retweetledi
Zilu Tang (Peter)
Zilu Tang (Peter)@Zilu_Tang_Peter·
When aligning LMs to personal preferences, is it at all beneficial to have model infer user preferences ("persona") over just using few-shots historical contexts? In our paper we found inferred "personas" improve generalization💪, contextual faithfulness🙏, and reduces bias⚖️!
English
1
4
4
1K
Gabriel Franco retweetledi
David Bau
David Bau@davidbau·
Thanks to all for making NEMI 2025 a wonderful event. Fascinating talks, inspiring posters, important discussions. You surfaced the questions animating our growing field. I learned many things and hope you did too! Looking forward to what the next year will bring.
David Bau tweet media
English
2
15
98
4.8K
Gabriel Franco retweetledi
Yulu Qin
Yulu Qin@yulu_qin·
Does vision training change how language is represented and used in meaningful ways?🤔 The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]
Yulu Qin tweet media
English
3
27
108
24.8K
Gabriel Franco retweetledi
METR
METR@METR_Evals·
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
METR tweet media
English
248
1.3K
6.7K
3.7M
Gabriel Franco retweetledi
Yukyung Lee
Yukyung Lee@yukyunglee_·
Can coding agents autonomously implement AI research extensions? We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code. Finding: Most agents we tested had a low success rate, but there is promise!
Yukyung Lee tweet media
English
1
33
134
23.7K