Raphaël Millière

2.7K posts

Raphaël Millière banner
Raphaël Millière

Raphaël Millière

@raphaelmilliere

AI & Cognitive Science @UniofOxford @EthicsInAI Fellow @JesusOxford @raphaelmilliere.com on 🦋 Blog: https://t.co/2hJjfShFfr

Oxford, UK Katılım Mayıs 2016
2.9K Takip Edilen10.9K Takipçiler
Sabitlenmiş Tweet
Raphaël Millière
Raphaël Millière@raphaelmilliere·
Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory? 📄 Our new ICML paper investigates this in a synthetic setting! 🧵 1/13
English
9
100
626
72.8K
François Fleuret
François Fleuret@francoisfleuret·
Hot take: machine learning and AI did more to understand the nature of knowledge, and our relation to reality than 20 centuries of philosophy. I am ready to kind of defend this hill.
English
362
109
1.5K
122.6K
Karina Vold
Karina Vold@karinavold·
Big thanks to @raphaelmilliere for his talk @TorontoSRI about why AI needs philosophy and cognitive science to make credible evaluations 🤖 vs 👦🏻 vs 🐵
Karina Vold tweet media
English
1
0
2
266
Raphaël Millière retweetledi
Kanishka Misra 🌊
Kanishka Misra 🌊@kanishkamisra·
New opinion piece on the interface between research on concepts and categories in minds vs. in neural network LMs! I take the position that there is much to be learned from this interface (e.g., learning about concepts from language alone) and outline some directions for future.
Kanishka Misra 🌊 tweet media
English
2
10
28
1.9K
Raphaël Millière retweetledi
Aryaman Arora
Aryaman Arora@aryaman2020·
all mech interp people are bought into causality, this criticism is very lazy as of ~2 years ago. since this is a subtweet of NLAs, it is worth pointing out that their steering experiments on the poetry and eval awareness tasks *do* test for (in those cases) causality!
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

Guys, stop pestering Mech Interp researchers about causality please! It's this inexplicable obsession with causality that made us lose beautiful sciences like Astrology, Palmistry and Phrenology! 😡

English
5
4
130
14.9K
Daniel Litt
Daniel Litt@littmath·
guy who understands things without thinking about them
Daniel Litt tweet media
English
54
58
2.4K
167.5K
Raphaël Millière retweetledi
Aryaman Arora
Aryaman Arora@aryaman2020·
pov: you are a natural language autoencoder and you are aware you are being subject to evals by Redwood Research. do you fake writing out a coherent cot or truthfully say "the math problem is giving me 92ish vibes"?
Aryaman Arora tweet media
Ryan Greenblatt@RyanPGreenblatt

How well does this work? One quick independent test is to see if it can recover an "internal CoT" in cases where AIs can solve math problems in a single forward pass. TLDR: it doesn't. (TBC, this might require the NLA to see activations at multiple positions/location to work.)

English
4
9
127
10.7K
Raphaël Millière
Raphaël Millière@raphaelmilliere·
@elyasbuilds I like activation steering as much as the next guy, but this isn't what I was referring to: x.com/raphaelmillier…
Raphaël Millière@raphaelmilliere

@jatin_n0 Mostly a joke, it's a cool paper! yes the planning result is causal but only looking at total effect (i.e. an NLA-derived resid stream edit changes the output). I was referring to causal effect on the model's downstream computations, not anything inside/after the autoencoder. 1/2

English
0
0
0
230
Raphaël Millière
Raphaël Millière@raphaelmilliere·
@jatin_n0 An additive AR-difference vector can change the output while acting as a broad steering perturbation without showing that the described content actually maps onto the operative feature in the model's putative "rhyme-planning" circuit 3/3
English
1
1
3
262
Raphaël Millière
Raphaël Millière@raphaelmilliere·
@jatin_n0 It's missing is evidecne about causal mediation: whether the NLA-described "rabbit plan" is the variable later components read, whether the edit produces a coherent "mouse plan" in later layers/tokens, whether ablating/patching intermediate states blocks or restores the effect 2/
English
1
1
5
333
Atoosa Kasirzadeh
Atoosa Kasirzadeh@Dr_Atoosa·
🌟 Big personal news: I’m joining @GoogleDeepMind full-time in London starting this week. I’ll be working on the implications of AGI for human life, science, and society; on what it means to live, connect, and discover in a world where cognitive agency is no longer uniquely ours. The way we answer these questions will define what it means to be human. I can’t think of a better place to do it.
English
95
40
1.2K
103.7K
Raphaël Millière
Raphaël Millière@raphaelmilliere·
@glnmario Also behavioral sensitivity to input properties doesn't entail the model routes through the probe site representation, so if path patching from those sites finds no downstream consumer whose output depends on them it's possible the info is decodable but not cuasally used? 3/3
English
0
0
1
44
Raphaël Millière
Raphaël Millière@raphaelmilliere·
@glnmario Or maybe the information is consumed via attention routing and patching residual values at the probe site is the wrong intervention. E.g. you could probe attention patterns directly and freezing them during your existing patches to check if self-repair was masking the effect 2/3
English
1
0
1
42
Mario Giulianelli
Mario Giulianelli@glnmario·
I keep seeing the same pattern that probes can decode the relevant information, often at multiple layers/sites, but activation patching doesn't change behaviour. Even patching across multiple layers had no effect. Is this a known failure mode? What should I be doing differently?
English
5
0
7
1.8K
Mario Giulianelli
Mario Giulianelli@glnmario·
@TransluceAI Last night I was trying to use Distributed Alignment Search (original, low-rank, boundless) but the runtime makes it effectively unusable
English
3
0
0
298
Raphaël Millière retweetledi
Michael C. Frank
Michael C. Frank@mcxfrank·
For a year and a half, @CaroRowland, @leher_singh, Marisa Casillas, Shanley Allen, and I have been meeting to discuss whether innateness is still a useful concept to think about in studying language acquisition.
Michael C. Frank tweet mediaMichael C. Frank tweet media
English
1
6
30
3.2K
Raphaël Millière
Raphaël Millière@raphaelmilliere·
@Dr_Atoosa In case this was partly prompted by my tweet: I was just responding to a specific historical claim, but I agree that the analytic/continental divide has partly outlived its usefulness in contemporary philosophy and that we should all strive for clarity, depth, and significance!
English
1
0
2
676
Atoosa Kasirzadeh
Atoosa Kasirzadeh@Dr_Atoosa·
Why are philosophers still clinging to the analytic–continental divide? It is 2026! We should be past it! The split now often closes doors to creative thinking and protects academic tribes. Philosophers of science and technology over the past century largely moved on by engaging history of science and STS, and their field became richer for it. The rest of philosophy should do the same. There are obscure continental philosophers, yes, but also plenty of analytic philosophers who manufacture tiny problems and solve them for an audience of twelve and claim pride for that achievement. What matters in our world of many open problems is whether a thinker is clear, interesting, and actually helps us understand or solve something important.
English
12
10
55
12.9K