Gabriel Franco

66 posts

Gabriel Franco

Gabriel Franco

@gvsfranco

CS PhD student @BUCompSci. Interested in interpretability.

Boston, MA เข้าร่วม Mayıs 2023
202 กำลังติดตาม110 ผู้ติดตาม
ทวีตที่ปักหมุด
Gabriel Franco
Gabriel Franco@gvsfranco·
Singular vectors of the attention QK matrix align with features! This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why. New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.
English
1
9
27
3.2K
Gabriel Franco รีทวีตแล้ว
Ruochen Zhang
Ruochen Zhang@ruochenz_·
🤗 Super excited to have this work out! Turns out by calculating the angles 📐 between representations, you can pick out difficult data samples! This can be very useful for assembling hard test sets or more efficient training sets. See more cool results and visuals in the 🧵
Naomi Saphra@nsaphra

We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples? 🧵NEW PAPER by @jenniferlumeng &al

English
0
1
36
6.6K
Gabriel Franco
Gabriel Franco@gvsfranco·
🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
Gabriel Franco tweet media
English
2
30
118
22.4K
Gabriel Franco รีทวีตแล้ว
Naomi Saphra
Naomi Saphra@nsaphra·
✨ it's coming ✨
Naomi Saphra tweet media
English
1
1
97
3.5K
Gabriel Franco รีทวีตแล้ว
Zilu Tang (Peter)
Zilu Tang (Peter)@Zilu_Tang_Peter·
How do language models track entities across state changes? When tracking objects in different boxes, do they cumulatively build up a global state of what’s in every box? How do they add objects or remove objects (i.e. Entity Unbinding)? Find out in our ICML paper! 🧵
Zilu Tang (Peter) tweet media
English
3
17
41
4.6K
Gabriel Franco
Gabriel Franco@gvsfranco·
We find that sparse decomposition occurs in Pythia and GPT-2 and emerges during training in a manner consistent with the toy model. Main takeaway: If you are trying to find features used by attention heads, look at the singular vectors of the QK matrix.
English
1
0
0
203
Gabriel Franco
Gabriel Franco@gvsfranco·
Singular vectors of the attention QK matrix align with features! This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why. New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.
English
1
9
27
3.2K
Gabriel Franco รีทวีตแล้ว
Ziyu Yao
Ziyu Yao@ZiyuYao·
Check out our new preprint reflecting on what "circuits" actually explain: arxiv.org/pdf/2605.09129… (w/ @DakingRai and @megamor2) While the current practice follows "hypothesis-driven circuit discovery", we found that circuits discovered using existing approaches do not describe the general task but rather are dataset-specific. When the dataset includes multiple distinct mechanisms, the current approaches cannot distinguish them. We propose Data-driven Circuit Discovery (DCD) and advocate that the principle of letting the data pattern reveal what mechanisms are there (as opposed to humans hypothesizing their existence or how they exist). Details in the thread🧵 Concurrent to our work, we are seeing more similar concerns raised by the community, e.g., - "Finding Interpretable Prompt-Specific Circuits in Language Models" by @gvsfranco arxiv.org/pdf/2602.13483 - "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs" by @fnruji316625 arxiv.org/pdf/2605.12671 More reflections and new methodologies are still needed in this space. #MechanisticInterpretability #LLM
Daking Rai@DakingRai

🚨 New paper: Data-driven Circuit Discovery for Interpretability of Language Models 🚨 Do circuits actually explain how language models (LM) implement a task? In mechanistic interpretability, the goal of circuit study is to discover a “circuit” that is responsible for implementing a “task”. But we find that existing methods often discover circuits that are: ❌ not general task circuits: they do not capture the full range of mechanisms LMs uses across the task. Instead, they find: ✅ dataset-specific circuits: they explain how the model processes the examples used for circuit discovery. ✅ mixed-mechanism circuits: consisting of multiple independent mechanisms mixed in a single circuit. 1/🧵

English
0
4
33
4.5K
Gabriel Franco
Gabriel Franco@gvsfranco·
@najoungkim Baklavastory has the best baklava that I had in my life (this was a recommendation from one of my turkish friends)
English
0
0
1
22
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
or send me coffee/dessert/anything recs
Najoung Kim 🫠 tweet media
English
3
0
4
249
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
i'll be hanging out in SF all day this friday, have a couple engagements but still have some free time so DM me or email me if you want to catch up!
English
2
0
16
1.5K
Gabriel Franco รีทวีตแล้ว
John Seon Keun Yi
John Seon Keun Yi@john_sk_yi·
[1] Multi-agent debate improves reasoning and factuality of LLMs, but is compute-intensive. How can we reap the benefits of debate while avoiding the inference costs of multi-agent communication? In our #ACL2026 paper, we present a method to internalize debate into a single LLM.
John Seon Keun Yi@john_sk_yi

Our paper "Latent Agents" was accepted to #ACL2026 Main! We distill multi-agent debate into a single LLM, matching debate performance at a fraction of the cost. We also show that internalized agents are discoverable and controllable. Huge thanks to @amuuueller and Dokyun Lee!

English
7
2
3
488
Gabriel Franco รีทวีตแล้ว
Jessica Hullman
Jessica Hullman@JessicaHullman·
“In my lab, you will have the luxury to think things through” is turning out to be pretty effective as a recruitment line for AI/ML PhD candidates. I guess permission to be careful is a scarce resource.
English
6
23
260
60.3K
Andrew Lee
Andrew Lee@a_jy_l·
😻New preprint! As an interp researcher, I often ask “why did the model attend to this token?” We study this by decomposing the query-key (QK) space into interpretable low-rank subspaces. When these subspaces of Qs and Ks align, the model produces high attention scores. 1/N
Andrew Lee tweet media
English
4
19
131
7.1K
Gabriel Franco รีทวีตแล้ว
Zhengyang Shan
Zhengyang Shan@ZhengyangShan·
Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our @eaclmeeting paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.
Zhengyang Shan tweet media
English
6
3
19
2.2K