Gabriel Franco (@gvsfranco) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Singular vectors of the attention QK matrix align with features! This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why. New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.

English

1

9

27

3.2K

Gabriel Franco รีทวีตแล้ว

Ruochen Zhang@ruochenz_·6d

🤗 Super excited to have this work out! Turns out by calculating the angles 📐 between representations, you can pick out difficult data samples! This can be very useful for assembling hard test sets or more efficient training sets. See more cool results and visuals in the 🧵

Naomi Saphra@nsaphra

We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples? 🧵NEW PAPER by @jenniferlumeng &al

English

0

1

36

6.6K

Gabriel Franco@gvsfranco·10 Haz

See the website for more info: nemiconf.github.io/summer26/ Registration: forms.gle/qUNq84pB6AyU4v… Submission: forms.gle/PEfMyL4J3PL9zt…

English

0

5

804

Gabriel Franco@gvsfranco·10 Haz

🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇

English

2

30

118

22.4K

Gabriel Franco รีทวีตแล้ว

Naomi Saphra@nsaphra·9 Haz

✨ it's coming ✨

English

1

97

3.5K

Gabriel Franco รีทวีตแล้ว

Zilu Tang (Peter)@Zilu_Tang_Peter·3 Haz

How do language models track entities across state changes? When tracking objects in different boxes, do they cumulatively build up a global state of what’s in every box? How do they add objects or remove objects (i.e. Entity Unbinding)? Find out in our ICML paper! 🧵

English

3

17

41

4.6K

Gabriel Franco@gvsfranco·28 May

Paper: arxiv.org/abs/2602.13524 Code: github.com/gaabrielfranco… Happy to chat and hear your feedback! #ICML2026 #Interpretability #Transformers #Attention

English

0

2

209

Gabriel Franco@gvsfranco·28 May

We find that sparse decomposition occurs in Pythia and GPT-2 and emerges during training in a manner consistent with the toy model. Main takeaway: If you are trying to find features used by attention heads, look at the singular vectors of the QK matrix.

English

1

0

203

Gabriel Franco@gvsfranco·28 May

Singular vectors of the attention QK matrix align with features! This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why. New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.

English

1

9

27

3.2K

Gabriel Franco@gvsfranco·15 May

Great paper! Happy to see other papers advocating for a data driven approaches in interpretability

Daking Rai@DakingRai

🚨 New paper: Data-driven Circuit Discovery for Interpretability of Language Models 🚨 Do circuits actually explain how language models (LM) implement a task? In mechanistic interpretability, the goal of circuit study is to discover a “circuit” that is responsible for implementing a “task”. But we find that existing methods often discover circuits that are: ❌ not general task circuits: they do not capture the full range of mechanisms LMs uses across the task. Instead, they find: ✅ dataset-specific circuits: they explain how the model processes the examples used for circuit discovery. ✅ mixed-mechanism circuits: consisting of multiple independent mechanisms mixed in a single circuit. 1/🧵

English

0

2

3

274

Gabriel Franco รีทวีตแล้ว

Ziyu Yao@ZiyuYao·15 May

Check out our new preprint reflecting on what "circuits" actually explain: arxiv.org/pdf/2605.09129… (w/ @DakingRai and @megamor2) While the current practice follows "hypothesis-driven circuit discovery", we found that circuits discovered using existing approaches do not describe the general task but rather are dataset-specific. When the dataset includes multiple distinct mechanisms, the current approaches cannot distinguish them. We propose Data-driven Circuit Discovery (DCD) and advocate that the principle of letting the data pattern reveal what mechanisms are there (as opposed to humans hypothesizing their existence or how they exist). Details in the thread🧵 Concurrent to our work, we are seeing more similar concerns raised by the community, e.g., - "Finding Interpretable Prompt-Specific Circuits in Language Models" by @gvsfranco arxiv.org/pdf/2602.13483 - "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs" by @fnruji316625 arxiv.org/pdf/2605.12671 More reflections and new methodologies are still needed in this space. #MechanisticInterpretability #LLM

Daking Rai@DakingRai

🚨 New paper: Data-driven Circuit Discovery for Interpretability of Language Models 🚨 Do circuits actually explain how language models (LM) implement a task? In mechanistic interpretability, the goal of circuit study is to discover a “circuit” that is responsible for implementing a “task”. But we find that existing methods often discover circuits that are: ❌ not general task circuits: they do not capture the full range of mechanisms LMs uses across the task. Instead, they find: ✅ dataset-specific circuits: they explain how the model processes the examples used for circuit discovery. ✅ mixed-mechanism circuits: consisting of multiple independent mechanisms mixed in a single circuit. 1/🧵

English

0

4

33

4.5K

Gabriel Franco@gvsfranco·13 May

@najoungkim Baklavastory has the best baklava that I had in my life (this was a recommendation from one of my turkish friends)

English

0

1

22

Najoung Kim 🫠@najoungkim·13 May

or send me coffee/dessert/anything recs

English

3

0

4

249

Najoung Kim 🫠@najoungkim·13 May

i'll be hanging out in SF all day this friday, have a couple engagements but still have some free time so DM me or email me if you want to catch up!

English

2

0

16

1.5K

Gabriel Franco รีทวีตแล้ว

John Seon Keun Yi@john_sk_yi·8 May

[1] Multi-agent debate improves reasoning and factuality of LLMs, but is compute-intensive. How can we reap the benefits of debate while avoiding the inference costs of multi-agent communication? In our #ACL2026 paper, we present a method to internalize debate into a single LLM.

John Seon Keun Yi@john_sk_yi

Our paper "Latent Agents" was accepted to #ACL2026 Main! We distill multi-agent debate into a single LLM, matching debate performance at a fraction of the cost. We also show that internalized agents are discoverable and controllable. Huge thanks to @amuuueller and Dokyun Lee!

English

7

2

3

488

Gabriel Franco รีทวีตแล้ว

Jessica Hullman@JessicaHullman·10 Nis

“In my lab, you will have the luxury to think things through” is turning out to be pretty effective as a recruitment line for AI/ML PhD candidates. I guess permission to be careful is a scarce resource.

English

6

23

260

60.3K

Gabriel Franco@gvsfranco·9 Şub

@a_jy_l @boknilev @viegasf @wattenberg Very interesting work!

English

0

1

62

Andrew Lee@a_jy_l·9 Şub

With amazing collaborators @boknilev @viegasf @wattenberg Paper: arxiv.org/pdf/2602.04752… Code: github.com/ajyl/QK Hope you enjoy! 6/N

English

1

11

518

Andrew Lee@a_jy_l·9 Şub

😻New preprint! As an interp researcher, I often ask “why did the model attend to this token?” We study this by decomposing the query-key (QK) space into interpretable low-rank subspaces. When these subspaces of Qs and Ks align, the model produces high attention scores. 1/N

English

4

19

131

7.1K

Gabriel Franco รีทวีตแล้ว

Zhengyang Shan@ZhengyangShan·20 Oca

Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our @eaclmeeting paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.

English

6

3

19

2.2K

Gabriel Franco

ค้นพบ