Julian Minder

221 posts

Julian Minder

@jkminder

PhD at EPFL with Robert West and Ryan Cotterell, MATS 7 Scholar with Neel Nanda

Lausanne/Zürich Katılım Kasım 2011

556 Takip Edilen648 Takipçiler

Sabitlenmiş Tweet

Julian Minder@jkminder·20 Eki

New paper: Finetuning on narrow domains leaves traces behind. By looking at the difference in activations before and after finetuning, we can interpret what it was finetuned for. And so can our interpretability agent! 🧵

English

159

31K

Julian Minder retweetledi

David Bau@davidbau·20h

In 1982, high school students in Sudbury, Mass. wrote a dungeon game called Hack. They had Atari 800s and Logo and an obsession with a Unix game called Rogue that most of them had never seen. I grew up one town over with the same computers and the same obsession.

English

1.2K

Julian Minder retweetledi

Mark Rofin@broccolitwit·6d

1/ Did you ever look at some feature represented by a Transformer and wonder: "Why would it even learn that? 🤔 " We did! Announcing the ICLR'26 paper "Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors"

English

259

16.2K

Julian Minder retweetledi

Ivan Zakazov@IvanZakazov·6 Mar

We built a proxy that compresses Claude Code/OpenClaw tool outputs before they hit the model. We're now live on Product Hunt! Fighting context bloat since 2026 👉👈 producthunt.com/products/conte…

English

559

Julian Minder@jkminder·2 Mar

I can highly recommend using nnsight+nnterp. It’s clearly making research a lot easier and I can personally vouch for Cléments exceptional engineering skills. he good!

Clément Dumas@Butanium_

NNsight's new version brings a lot of huge improvements, including allowing you to run your nnterp code on the NDIF servers!! nnterp has been growing a lot and I'm glad it's been useful to interp researchers!

English

3.2K

Julian Minder retweetledi

David Bau@davidbau·28 Şub

Sam Altman @sama and @DarioAmodei have both staked out positions on AI weapons. But you can see from what they've said: the gap between them is a question of professional ethics.

Miles Brundage@Miles_Brundage

If I worked at OpenAI, I would want to see the contract details, or have a critical mass of employees from a range of teams/perspectives see the contract details. What has been reported so far is confusing + the details matter. x.com/axios/status/2…

English

10K

Julian Minder retweetledi

David Bau@davidbau·27 Şub

I am so excited by the 0.6 release of NNsight and NDIF. More below (and more soon):

Jaden Fiotto-Kaufman@jadenfk23

NNsight 0.6 is out now! We directly address your feedback in our biggest release yet. Pain points included cryptic errors, slow traces, no remote execution of custom code, and limited vLLM support. We tackle all of these and more in this new release. 🧵 Here's what changed:

English

2.8K

Julian Minder retweetledi

Jake Ward@_jake_ward·26 Şub

Circuit tracing is cool, but can it be used for model diffing? We investigate mechanisms introduced during reasoning fine-tuning by training transcoder _adapters_ to faithfully reconstruct MLP output _differences_. Check it out!

Nathan Hu@NathanHu12

What does reasoning fine-tuning actually change inside a model? In our new paper, we introduce transcoder adapters to learn sparse, interpretable approximations of how reasoning fine-tuning changes MLP computation. 🧵

English

Julian Minder retweetledi

Adam Shai@adamimos·23 Şub

A longstanding dream of interp is to decompose activations into distinct, interpretable parts. But when should we expect that to work, and what even are such parts? New from Simplex: transformers factor their world into orthogonal subspaces, even when it costs accuracy.🧵👇

GIF

English

530

48.5K

Julian Minder retweetledi

Kiho Park@KihoPark_·19 Şub

Interpreting and controlling internal representations should be based on how the model actually uses them! Turns out: information geometry makes this precise. We show how, and use it to derive a (provably & empirically) robust strategy for steering. arxiv.org/abs/2602.15293

English

714

77.9K

Julian Minder retweetledi

Chris Wendler@wendlerch·13 Şub

It is not enough to mechanistically analyse the final checkpoint. Deep learning is about neural networks optimised using gradient descent on large datasets. Thanks to initiatives like Pythia and olmo we can study how circuits are formed during training and how they interact!

Kerem Şahin@keremsahin2210

Are induction heads necessary for the emergence of in-context learning (ICL)? Their emergence coincides with a sharp ICL improvement, raising the hypothesis they may underlie much of ICL. However, we find that ICL beyond copying can emerge even when we suppress induction heads!

English

2.9K

Julian Minder retweetledi

Chris Wendler@wendlerch·10 Şub

Data is plenty, knowledge is scarce. We began to close this gap thanks to deep learning <3 Neural networks can learn “programs” that often achieve superhuman performance from data alone. What insights are encoded in their weights? Here we took a first step on AI protein folding.

Kevin Lu@kevinlu4588

How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.

English

1.8K

Julian Minder@jkminder·7 Şub

This vision is really awesome and very close to a lot of things I have been thinking about. "We currently attempt to design these systems by an expensive process of guess-and-check: first train, then evaluate, then tweak our training setup in ways we hope will work."

Tom McGrath@banburismus_

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.

English

1.7K

Julian Minder retweetledi

Jack Merullo@jack_merullo_·5 Şub

"rather than guessing what a model might learn and hoping we've accounted for it, we use interpretability tools to directly observe what the model is actually learning from each datapoint, then intervene to ensure only the intended lessons are absorbed." I recommend reading this!

Tom McGrath@banburismus_

English

4.5K

Julian Minder@jkminder·3 Şub

Come by and chat with us! Very much looking forward.

Bob West@cervisiarius

Tue 10 Feb: join us at #AMLD2026 for the AI Safety & Alignment track—principles for building AI that does what humans want. 2026.appliedmldays.org/media-93-confe… Two sessions (AM/PM): field overview + talks on post-training, mech interp, AI psych, and core challenges. 👇Schedule

English

276

Julian Minder@jkminder·26 Oca

Late to the party but - phenomenal paper!

Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English

413

Julian Minder retweetledi

Neil Zeghidour@neilzegh·20 Oca

Me defending my O(n^3) solution to the coding interviewer.

English

423

49.7K

Julian Minder retweetledi

Eric J. Michaud@ericjmichaud_·13 Oca

How does scaling up neural networks change what they learn? Despite its importance, our understanding of this question remains nascent. I've written a long post reflecting on my model of neural scaling and its relationship to interpretability, etc.: ericjmichaud.com/quanta

English

161

1.4K

333.5K

Julian Minder retweetledi

Andrew Gordon Wilson@andrewgwils·7 Oca

We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement! 1/7

Marc Finzi@m_finzi

English

187

1.3K

161.5K

Julian Minder retweetledi

Adam Karvonen@a_karvonen·29 Ara

Interested in using Activation Oracles for your project? I trained AOs across 12 models from the Gemma-2, Gemma-3, Qwen3, and Llama-3 families. Sizes range from 1B-70B. HuggingFace and notebook links below.

English

111

25.3K

Julian Minder retweetledi

Owain Evans@OwainEvans_UK·18 Ara

Paper: arxiv.org/abs/2512.15674 Authors: @a_karvonen @jameschua_sg @Butanium_ @KitF_T @thesubhashk @jkminder @euan_ong @arnab_api Daniel Wen, Owain @saprmarks

Keşfet

@sama @DarioAmodei @a_karvonen @jameschua_sg @Butanium_ @KitF_T @thesubhashk @euan_ong