Julian Minder

221 posts

Julian Minder

Julian Minder

@jkminder

PhD at EPFL with Robert West and Ryan Cotterell, MATS 7 Scholar with Neel Nanda

Lausanne/Zürich Katılım Kasım 2011
556 Takip Edilen648 Takipçiler
Sabitlenmiş Tweet
Julian Minder
Julian Minder@jkminder·
New paper: Finetuning on narrow domains leaves traces behind. By looking at the difference in activations before and after finetuning, we can interpret what it was finetuned for. And so can our interpretability agent! 🧵
Julian Minder tweet media
English
2
27
159
31K
Julian Minder retweetledi
David Bau
David Bau@davidbau·
In 1982, high school students in Sudbury, Mass. wrote a dungeon game called Hack. They had Atari 800s and Logo and an obsession with a Unix game called Rogue that most of them had never seen. I grew up one town over with the same computers and the same obsession.
David Bau tweet media
English
1
6
17
1.2K
Julian Minder retweetledi
Mark Rofin
Mark Rofin@broccolitwit·
1/ Did you ever look at some feature represented by a Transformer and wonder: "Why would it even learn that? 🤔 " We did! Announcing the ICLR'26 paper "Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors"
Mark Rofin tweet media
English
5
30
259
16.2K
Julian Minder retweetledi
Ivan Zakazov
Ivan Zakazov@IvanZakazov·
We built a proxy that compresses Claude Code/OpenClaw tool outputs before they hit the model. We're now live on Product Hunt! Fighting context bloat since 2026 👉👈 producthunt.com/products/conte…
English
4
3
13
559
Julian Minder retweetledi
Julian Minder retweetledi
Jake Ward
Jake Ward@_jake_ward·
Circuit tracing is cool, but can it be used for model diffing? We investigate mechanisms introduced during reasoning fine-tuning by training transcoder _adapters_ to faithfully reconstruct MLP output _differences_. Check it out!
Nathan Hu@NathanHu12

What does reasoning fine-tuning actually change inside a model? In our new paper, we introduce transcoder adapters to learn sparse, interpretable approximations of how reasoning fine-tuning changes MLP computation. 🧵

English
0
5
38
4K
Julian Minder retweetledi
Adam Shai
Adam Shai@adamimos·
A longstanding dream of interp is to decompose activations into distinct, interpretable parts. But when should we expect that to work, and what even are such parts? New from Simplex: transformers factor their world into orthogonal subspaces, even when it costs accuracy.🧵👇
GIF
English
12
84
530
48.5K
Julian Minder retweetledi
Kiho Park
Kiho Park@KihoPark_·
Interpreting and controlling internal representations should be based on how the model actually uses them! Turns out: information geometry makes this precise. We show how, and use it to derive a (provably & empirically) robust strategy for steering. arxiv.org/abs/2602.15293
English
10
91
714
77.9K
Julian Minder retweetledi
Chris Wendler
Chris Wendler@wendlerch·
It is not enough to mechanistically analyse the final checkpoint. Deep learning is about neural networks optimised using gradient descent on large datasets. Thanks to initiatives like Pythia and olmo we can study how circuits are formed during training and how they interact!
Kerem Şahin@keremsahin2210

Are induction heads necessary for the emergence of in-context learning (ICL)? Their emergence coincides with a sharp ICL improvement, raising the hypothesis they may underlie much of ICL. However, we find that ICL beyond copying can emerge even when we suppress induction heads!

English
0
3
43
2.9K
Julian Minder retweetledi
Chris Wendler
Chris Wendler@wendlerch·
Data is plenty, knowledge is scarce. We began to close this gap thanks to deep learning <3 Neural networks can learn “programs” that often achieve superhuman performance from data alone. What insights are encoded in their weights? Here we took a first step on AI protein folding.
Kevin Lu@kevinlu4588

How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.

English
2
10
28
1.8K
Julian Minder
Julian Minder@jkminder·
This vision is really awesome and very close to a lot of things I have been thinking about. "We currently attempt to design these systems by an expensive process of guess-and-check: first train, then evaluate, then tweak our training setup in ways we hope will work."
Tom McGrath@banburismus_

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.

English
0
0
18
1.7K
Julian Minder retweetledi
Jack Merullo
Jack Merullo@jack_merullo_·
"rather than guessing what a model might learn and hoping we've accounted for it, we use interpretability tools to directly observe what the model is actually learning from each datapoint, then intervene to ensure only the intended lessons are absorbed." I recommend reading this!
Tom McGrath@banburismus_

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.

English
0
2
55
4.5K
Julian Minder retweetledi
Neil Zeghidour
Neil Zeghidour@neilzegh·
Me defending my O(n^3) solution to the coding interviewer.
English
423
5K
49.7K
4M
Julian Minder retweetledi
Eric J. Michaud
Eric J. Michaud@ericjmichaud_·
How does scaling up neural networks change what they learn? Despite its importance, our understanding of this question remains nascent. I've written a long post reflecting on my model of neural scaling and its relationship to interpretability, etc.: ericjmichaud.com/quanta
English
38
161
1.4K
333.5K
Julian Minder retweetledi
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement! 1/7
Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English
35
187
1.3K
161.5K
Julian Minder retweetledi
Adam Karvonen
Adam Karvonen@a_karvonen·
Interested in using Activation Oracles for your project? I trained AOs across 12 models from the Gemma-2, Gemma-3, Qwen3, and Llama-3 families. Sizes range from 1B-70B. HuggingFace and notebook links below.
English
4
12
111
25.3K