Gabriel Franco

67 posts

Gabriel Franco

Gabriel Franco

@gvsfranco

CS PhD student @BUCompSci. Interested in interpretability.

Boston, MA 가입일 Mayıs 2023
202 팔로잉110 팔로워
고정된 트윗
Gabriel Franco
Gabriel Franco@gvsfranco·
Singular vectors of the attention QK matrix align with features! This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why. New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.
English
1
9
27
3.2K
Gabriel Franco 리트윗함
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
🐸 Please help circulate! Our group is conducting a super short, anonymous survey for people who incorporate AI into their research workflow. Let us know what you like using and what we've missed! 🙏 forms.gle/WEDFEkK3nSmZ59…
English
1
13
24
3.5K
Gabriel Franco 리트윗함
Ruochen Zhang
Ruochen Zhang@ruochenz_·
🤗 Super excited to have this work out! Turns out by calculating the angles 📐 between representations, you can pick out difficult data samples! This can be very useful for assembling hard test sets or more efficient training sets. See more cool results and visuals in the 🧵
Naomi Saphra@nsaphra

We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples? 🧵NEW PAPER by @jenniferlumeng &al

English
0
1
36
6.6K
Gabriel Franco
Gabriel Franco@gvsfranco·
🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
Gabriel Franco tweet media
English
2
30
118
22.5K
Gabriel Franco 리트윗함
Naomi Saphra
Naomi Saphra@nsaphra·
✨ it's coming ✨
Naomi Saphra tweet media
English
1
1
97
3.5K
Gabriel Franco 리트윗함
Zilu Tang (Peter)
Zilu Tang (Peter)@Zilu_Tang_Peter·
How do language models track entities across state changes? When tracking objects in different boxes, do they cumulatively build up a global state of what’s in every box? How do they add objects or remove objects (i.e. Entity Unbinding)? Find out in our ICML paper! 🧵
Zilu Tang (Peter) tweet media
English
3
17
41
4.7K
Gabriel Franco
Gabriel Franco@gvsfranco·
We find that sparse decomposition occurs in Pythia and GPT-2 and emerges during training in a manner consistent with the toy model. Main takeaway: If you are trying to find features used by attention heads, look at the singular vectors of the QK matrix.
English
1
0
0
210
Gabriel Franco
Gabriel Franco@gvsfranco·
Singular vectors of the attention QK matrix align with features! This has been found empirically in other works, like Talking Heads from @jack_merullo_. We show theoretically and empirically how and why. New @icmlconf 2026 paper with Carson Loughridge and @mcrovella.
English
1
9
27
3.2K
Gabriel Franco 리트윗함
Ziyu Yao
Ziyu Yao@ZiyuYao·
Check out our new preprint reflecting on what "circuits" actually explain: arxiv.org/pdf/2605.09129… (w/ @DakingRai and @megamor2) While the current practice follows "hypothesis-driven circuit discovery", we found that circuits discovered using existing approaches do not describe the general task but rather are dataset-specific. When the dataset includes multiple distinct mechanisms, the current approaches cannot distinguish them. We propose Data-driven Circuit Discovery (DCD) and advocate that the principle of letting the data pattern reveal what mechanisms are there (as opposed to humans hypothesizing their existence or how they exist). Details in the thread🧵 Concurrent to our work, we are seeing more similar concerns raised by the community, e.g., - "Finding Interpretable Prompt-Specific Circuits in Language Models" by @gvsfranco arxiv.org/pdf/2602.13483 - "All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs" by @fnruji316625 arxiv.org/pdf/2605.12671 More reflections and new methodologies are still needed in this space. #MechanisticInterpretability #LLM
Daking Rai@DakingRai

🚨 New paper: Data-driven Circuit Discovery for Interpretability of Language Models 🚨 Do circuits actually explain how language models (LM) implement a task? In mechanistic interpretability, the goal of circuit study is to discover a “circuit” that is responsible for implementing a “task”. But we find that existing methods often discover circuits that are: ❌ not general task circuits: they do not capture the full range of mechanisms LMs uses across the task. Instead, they find: ✅ dataset-specific circuits: they explain how the model processes the examples used for circuit discovery. ✅ mixed-mechanism circuits: consisting of multiple independent mechanisms mixed in a single circuit. 1/🧵

English
0
4
33
4.5K
Gabriel Franco
Gabriel Franco@gvsfranco·
@najoungkim Baklavastory has the best baklava that I had in my life (this was a recommendation from one of my turkish friends)
English
0
0
1
22
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
or send me coffee/dessert/anything recs
Najoung Kim 🫠 tweet media
English
3
0
4
253
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
i'll be hanging out in SF all day this friday, have a couple engagements but still have some free time so DM me or email me if you want to catch up!
English
2
0
16
1.5K
Gabriel Franco 리트윗함
John Seon Keun Yi
John Seon Keun Yi@john_sk_yi·
[1] Multi-agent debate improves reasoning and factuality of LLMs, but is compute-intensive. How can we reap the benefits of debate while avoiding the inference costs of multi-agent communication? In our #ACL2026 paper, we present a method to internalize debate into a single LLM.
John Seon Keun Yi@john_sk_yi

Our paper "Latent Agents" was accepted to #ACL2026 Main! We distill multi-agent debate into a single LLM, matching debate performance at a fraction of the cost. We also show that internalized agents are discoverable and controllable. Huge thanks to @amuuueller and Dokyun Lee!

English
7
2
3
489
Gabriel Franco 리트윗함
Jessica Hullman
Jessica Hullman@JessicaHullman·
“In my lab, you will have the luxury to think things through” is turning out to be pretty effective as a recruitment line for AI/ML PhD candidates. I guess permission to be careful is a scarce resource.
English
6
23
260
60.3K
Andrew Lee
Andrew Lee@a_jy_l·
😻New preprint! As an interp researcher, I often ask “why did the model attend to this token?” We study this by decomposing the query-key (QK) space into interpretable low-rank subspaces. When these subspaces of Qs and Ks align, the model produces high attention scores. 1/N
Andrew Lee tweet media
English
4
19
131
7.1K