Aaron Mueller

314 posts

Aaron Mueller

@amuuueller

Asst. Prof. in CS at @BU_Tweets ≡ {Mechanistic, causal} {interpretability, NLP}

Boston Katılım Eylül 2015

730 Takip Edilen1.9K Takipçiler

Aaron Mueller retweetledi

BlackboxNLP@BlackboxNLP·3d

BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October! This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️ Stay tuned for more details!

English

4.8K

Aaron Mueller retweetledi

Aruna S@arunasank·3d

Interpretability methods usually study single-token behavior. But real model behaviors, like sycophancy or writing style, are diffuse across many tokens. Can these diffuse behaviors be localized and controlled from long-form responses? YES!

GIF

English

100

9.2K

Aaron Mueller retweetledi

Michael Hu@michahu8·27 Şub

fyi, @babyLMchallenge has been doing this for 4 years now. some interesting ideas from our past competitions for folks to consider: 1. mixing causal and masked LM objectives (GPT-BERT) 2. mixture of experts as a way to better model human cognition

Samip@industriaalist

1/ Introducing NanoGPT Slowrun 🐢: an open repo for state-of-the-art data-efficient learning algorithms. It's built for the crazy ideas that speedruns filter out -- expensive optimizers, heavy regularization, SGD replacements like evolutionary search.

English

Aaron Mueller retweetledi

Kiho Park@KihoPark_·19 Şub

Interpreting and controlling internal representations should be based on how the model actually uses them! Turns out: information geometry makes this precise. We show how, and use it to derive a (provably & empirically) robust strategy for steering. arxiv.org/abs/2602.15293

English

715

77.9K

Aaron Mueller retweetledi

Kerem Şahin@keremsahin2210·13 Şub

Are induction heads necessary for the emergence of in-context learning (ICL)? Their emergence coincides with a sharp ICL improvement, raising the hypothesis they may underlie much of ICL. However, we find that ICL beyond copying can emerge even when we suppress induction heads!

English

124

16.6K

Aaron Mueller retweetledi

Kevin Lu@kevinlu4588·10 Şub

How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.

English

207

19.9K

Aaron Mueller retweetledi

Andrew Lampinen@AndrewLampinen·29 Oca

New paper studying how language models representations of things like factuality evolve over a conversation. We find that in edge case conversations, e.g. about model consciousness or delusional content, model representations can change dramatically! 1/

English

289

22.8K

Aaron Mueller retweetledi

Joe Stacey@_joestacey_·21 Oca

Super excited to have my last PhD paper about NLI robustness published at EACL Findings😍 We investigate how to make closed-source LLMs more robust after fine-tuning. Here are the paper highlights 🧵

English

5.1K

Aaron Mueller@amuuueller·20 Oca

Representation steering is now a common way to mitigate LLM shortcuts. How much legitimate knowledge does this tend to remove? Turns out that these methods can be surprisingly precise! But also: no single steering operation will fix all shortcuts. Led by @ZhengyangShan!

Zhengyang Shan@ZhengyangShan

Can steering remove LLM shortcuts without breaking legitimate LLM capabilities? In our @eaclmeeting paper, we show that conceptual bias is separable from concept detection; this means inference-time debiasing is possible with minimal capability loss.

English

1.2K

Aaron Mueller retweetledi

Najoung Kim 🫠@najoungkim·19 Kas

I also want to mention that the lang x computation research community at BU is growing in an exciting direction, especially with new faculty like @amuuueller, Anthony Yacovone, @nsaphra, & Sophie Hao! Also, Boston is quite nice :)

English

1.8K

Aaron Mueller retweetledi

Tom McCoy@RTomMcCoy·14 Kas

🤖🧠I'll be considering applications for PhD students & postdocs to start at Yale in Fall 2026! If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply! PhD link: rtmccoy.com/prospective_st… Postdoc link: rtmccoy.com/prospective_po…

English

275

19.7K

Aaron Mueller retweetledi

Can Rager@can_rager·13 Kas

Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really. Our *Temporal Feature Analyzer* discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.

English

144

11.9K

Aaron Mueller@amuuueller·13 Kas

Check out Ekdeep's thread for a more detailed (and very visually appealing) overview! 📜 Preprint: arxiv.org/abs/2511.01836 🧠 Play with temporal feature analysis on Neuronpedia: neuronpedia.org/gemma-2-2b/12-…

English

207

Aaron Mueller@amuuueller·13 Kas

I'm glossing over our deeper motivations from neuroscience (predictive coding) and linguistics here, but we believe there's significant cross-field appeal for those interested in intersections of cog sci, neuroscience, and machine learning!

English

206

Aaron Mueller@amuuueller·13 Kas

In LLMs, concepts aren’t static: they evolve through time and have rich temporal dependencies. We introduce Temporal Feature Analysis (TFA) to separate what is inferred from context vs. the features each token adds. A herculean effort led by @EkdeepL, @can_rager, @sumedh_hrs!

Ekdeep Singh Lubana@EkdeepL

New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14)

English

3.3K

Aaron Mueller retweetledi

BlackboxNLP@BlackboxNLP·9 Kas

Excited to announce this year's best paper award: 🏆 "Language Dominance in Multilingual Large Language Models" by Nadav Shani and Ali Basirat 🏆 This paper challenges a common conception that multilingual models perform computation via a dominant language. Congratulations!

English

2.3K

Aaron Mueller@amuuueller·18 Eki

@lovodkin93 Congrats Aviv!!

Català

106

Aviv Slobodkin @NeurIPS@lovodkin93·16 Eki

I’m excited to share that I’ve started a full-time position as a Research Scientist at Google! 🚀 I’ve also moved to the Bay Area 🌉, so if you are around please text me and we can meet for coffee! To new beginnings!

English

291

24.8K

Aaron Mueller retweetledi

Kanishka Misra 🌊@kanishkamisra·16 Eki

"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷 We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge. New paper w/ Daniel, Will, @jessyjli!

English

16.6K

Keşfet

@babyLMchallenge @ZhengyangShan @nsaphra @EkdeepL @can_rager @sumedh_hrs @lovodkin93 @jessyjli