Aaron Mueller

314 posts

Aaron Mueller banner
Aaron Mueller

Aaron Mueller

@amuuueller

Asst. Prof. in CS at @BU_Tweets ≡ {Mechanistic, causal} {interpretability, NLP}

Boston Katılım Eylül 2015
730 Takip Edilen1.9K Takipçiler
Aaron Mueller retweetledi
BlackboxNLP
BlackboxNLP@BlackboxNLP·
BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October! This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️ Stay tuned for more details!
BlackboxNLP tweet media
English
1
13
53
4.8K
Aaron Mueller retweetledi
Aruna S
Aruna S@arunasank·
Interpretability methods usually study single-token behavior. But real model behaviors, like sycophancy or writing style, are diffuse across many tokens. Can these diffuse behaviors be localized and controlled from long-form responses? YES!
GIF
English
3
20
100
9.2K
Aaron Mueller retweetledi
Michael Hu
Michael Hu@michahu8·
fyi, @babyLMchallenge has been doing this for 4 years now. some interesting ideas from our past competitions for folks to consider: 1. mixing causal and masked LM objectives (GPT-BERT) 2. mixture of experts as a way to better model human cognition
Samip@industriaalist

1/ Introducing NanoGPT Slowrun 🐢: an open repo for state-of-the-art data-efficient learning algorithms. It's built for the crazy ideas that speedruns filter out -- expensive optimizers, heavy regularization, SGD replacements like evolutionary search.

English
1
8
29
4K
Aaron Mueller retweetledi
Kiho Park
Kiho Park@KihoPark_·
Interpreting and controlling internal representations should be based on how the model actually uses them! Turns out: information geometry makes this precise. We show how, and use it to derive a (provably & empirically) robust strategy for steering. arxiv.org/abs/2602.15293
English
10
92
715
77.9K
Aaron Mueller retweetledi
Kerem Şahin
Kerem Şahin@keremsahin2210·
Are induction heads necessary for the emergence of in-context learning (ICL)? Their emergence coincides with a sharp ICL improvement, raising the hypothesis they may underlie much of ICL. However, we find that ICL beyond copying can emerge even when we suppress induction heads!
Kerem Şahin tweet media
English
3
16
124
16.6K
Aaron Mueller retweetledi
Kevin Lu
Kevin Lu@kevinlu4588·
How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.
Kevin Lu tweet media
English
4
44
207
19.9K
Aaron Mueller retweetledi
Andrew Lampinen
Andrew Lampinen@AndrewLampinen·
New paper studying how language models representations of things like factuality evolve over a conversation. We find that in edge case conversations, e.g. about model consciousness or delusional content, model representations can change dramatically! 1/
Andrew Lampinen tweet media
English
6
37
289
22.8K
Aaron Mueller retweetledi
Joe Stacey
Joe Stacey@_joestacey_·
Super excited to have my last PhD paper about NLI robustness published at EACL Findings😍 We investigate how to make closed-source LLMs more robust after fine-tuning. Here are the paper highlights 🧵
Joe Stacey tweet media
English
1
5
60
5.1K
Aaron Mueller
Aaron Mueller@amuuueller·
Representation steering is now a common way to mitigate LLM shortcuts. How much legitimate knowledge does this tend to remove? Turns out that these methods can be surprisingly precise! But also: no single steering operation will fix all shortcuts. Led by @ZhengyangShan!
Zhengyang Shan@ZhengyangShan

Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our @eaclmeeting paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.

English
0
2
22
1.2K
Aaron Mueller retweetledi
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
I also want to mention that the lang x computation research community at BU is growing in an exciting direction, especially with new faculty like @amuuueller, Anthony Yacovone, @nsaphra, & Sophie Hao! Also, Boston is quite nice :)
Najoung Kim 🫠 tweet media
English
1
2
9
1.8K
Aaron Mueller retweetledi
Tom McCoy
Tom McCoy@RTomMcCoy·
🤖🧠I'll be considering applications for PhD students & postdocs to start at Yale in Fall 2026! If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply! PhD link: rtmccoy.com/prospective_st… Postdoc link: rtmccoy.com/prospective_po…
Tom McCoy tweet media
English
6
62
275
19.7K
Aaron Mueller retweetledi
Can Rager
Can Rager@can_rager·
Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really. Our *Temporal Feature Analyzer* discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.
Can Rager tweet media
English
7
35
144
11.9K
Aaron Mueller
Aaron Mueller@amuuueller·
I'm glossing over our deeper motivations from neuroscience (predictive coding) and linguistics here, but we believe there's significant cross-field appeal for those interested in intersections of cog sci, neuroscience, and machine learning!
English
1
0
1
206
Aaron Mueller
Aaron Mueller@amuuueller·
In LLMs, concepts aren’t static: they evolve through time and have rich temporal dependencies. We introduce Temporal Feature Analysis (TFA) to separate what is inferred from context vs. the features each token adds. A herculean effort led by @EkdeepL, @can_rager, @sumedh_hrs!
Ekdeep Singh Lubana@EkdeepL

New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14)

English
2
5
31
3.3K
Aaron Mueller retweetledi
BlackboxNLP
BlackboxNLP@BlackboxNLP·
Excited to announce this year's best paper award: 🏆 "Language Dominance in Multilingual Large Language Models" by Nadav Shani and Ali Basirat 🏆 This paper challenges a common conception that multilingual models perform computation via a dominant language. Congratulations!
BlackboxNLP tweet mediaBlackboxNLP tweet media
English
0
8
34
2.3K
Aviv Slobodkin @NeurIPS
Aviv Slobodkin @NeurIPS@lovodkin93·
I’m excited to share that I’ve started a full-time position as a Research Scientist at Google! 🚀 I’ve also moved to the Bay Area 🌉, so if you are around please text me and we can meet for coffee! To new beginnings!
English
21
6
291
24.8K
Aaron Mueller retweetledi
Kanishka Misra 🌊
Kanishka Misra 🌊@kanishkamisra·
"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷 We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge. New paper w/ Daniel, Will, @jessyjli!
Kanishka Misra 🌊 tweet media
English
3
13
60
16.6K