Eric Todd

158 posts

Eric Todd

Eric Todd

@ericwtodd

Computer Science PhD Student at Northeastern University

Boston, MA Katılım Aralık 2014
473 Takip Edilen490 Takipçiler
Sabitlenmiş Tweet
Eric Todd
Eric Todd@ericwtodd·
Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️
Eric Todd tweet media
English
8
49
321
55.8K
Eric Todd retweetledi
Goodfire
Goodfire@GoodfireAI·
The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English
22
144
981
149.5K
Eric Todd retweetledi
Sheridan Feucht
Sheridan Feucht@sheridan_feucht·
Neural networks have beautiful feature geometry, but do they have mechanisms that actually interface with those structures? At @GoodfireAI this spring, we discovered one: a re-usable addition mechanism that reads/writes to Fourier features from prior work. 🧵
Goodfire@GoodfireAI

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

English
7
41
247
61.9K
Eric Todd retweetledi
Zihao (Gavin) Yang
Zihao (Gavin) Yang@ZihaoGavinYang·
1/ (New paper!) If swapping the gender in an input prompt makes the AI model give a different answer it means that it has to have a gender bias, right? Wrong. 🧵on counterfactual prompting for LLM evals: Paper: arxiv.org/abs/2605.01048
Zihao (Gavin) Yang tweet media
English
3
25
289
306.5K
Eric Todd retweetledi
David Bau
David Bau@davidbau·
The Teleport Contest is open. Port NetHack 5.0 from C to JavaScript, bit-exactly. Same screen, every keystroke. Any approach: LLM agents, hand-coded, transpiler, hybrid. Live leaderboard, two phases through December. mazesofmenace.ai/announcement
David Bau tweet media
English
3
13
44
6K
Eric Todd retweetledi
David Bau
David Bau@davidbau·
NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today. nethack.org/common/index.h… And ... it is a VERY cool large codebase to work with in the LLM era.
David Bau tweet media
English
19
201
1.1K
121.5K
Eric Todd
Eric Todd@ericwtodd·
I'll be attending #ICLR2026 next week to present my work on In-Context Algebra! My poster will be on Fri, April 24 at 3:15-5:45PM at Pavilion 4 P4-#4011. If you're around, stop by and say hello! My DMs are open if you want to connect or meet up in Rio!
Eric Todd@ericwtodd

Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️

English
0
2
14
530
Eric Todd retweetledi
David Bau
David Bau@davidbau·
2026 is a whirlwind year for AI. Underlying it all: the greatest scientific mystery of our age. How does a neural network think? I talked w @oliver_whang22 in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled: nytimes.com/2026/04/15/mag…
English
1
5
53
3.2K
Eric Todd retweetledi
Nikhil Prakash
Nikhil Prakash@nikhil07prakash·
Excited to be attending #ICLR in person this year! I’ll be presenting 3 works across the main conference and workshops. If you’re around, please stop by, say hi, and feel free to reach out if you’d like to connect!
English
3
1
15
1.3K
Michał Podlewski
Michał Podlewski@trajektoriePL·
Terence Tao proposes what he calls a "Copernican view of intelligence". Instead of buying into the common, one-dimensional narrative that artificial intelligence will simply evolve from "subhuman" to "superhuman" and ultimately make humanity entirely redundant, Tao urges us to look at the bigger picture. Much like the Copernican revolution proved the Earth is not the center of the universe, Tao suggests we need to realize that human intelligence isn't the only, or necessarily the highest, form of intellect. Historically, we have treated other forms of storing or creating knowledge—like animals, books, and computers—as secondary. However, we actually exist within a much richer universe of intelligence. Both human intelligence and computer intelligence possess their own distinct strengths and weaknesses. The true potential lies not in viewing them as direct competitors, but rather in focusing on collaboration. By working together, humans and computers can achieve additional things that neither could accomplish on their own, requiring us to think in much wider terms than just what humans or computers can do alone.
English
139
606
4.1K
603.2K
Eric Todd retweetledi
Hadas Orgad
Hadas Orgad@OrgadHadas·
New paper: LLMs encode harmful content generation in a distinct, unified mechanism Using weight pruning, we find that harmful generation depends on a tiny subset of the weights that are shared across harm types and separate from benign capabilities. 🧵
Hadas Orgad tweet media
English
7
47
251
38.7K
Eric Todd retweetledi
Hye Sun Yun
Hye Sun Yun@hyesunyun·
Patients ask LLMs medical questions, but how they phrase it matters more than it should. Our new preprint explores how different phrasings of patient health questions can lead to inconsistent conclusions, even with the same evidence. [1/6] Full Paper: arxiv.org/abs/2604.05051
Hye Sun Yun tweet media
English
1
6
21
2.7K
Eric Todd retweetledi
Andrew Lee
Andrew Lee@a_jy_l·
If you enjoyed Anthropic's recent emotions paper, check out our pre-print! We find many many similarities: 1) Circular geometry of emotion representations that resembles the "Circumplex Model of Affects" from psychology 2) Steering effects on affective properties of LM outputs -- unlike Anthropic, we steer along the circular manifold (at 0°, 30°, 60°, etc.) 3) Steering effects on other downstream behavior (refusal, sycophancy) -- steering emotion representations can affect refusal/sycophancy rates. The last one was a bit unexpected - we provide a mechanistic account for why this might happen. See Lihao's thread below for details!👇
Lihao Sun@1e0sun

💡New paper! Woke up to @AnthropicAI's emotion paper and realized - “wait, that's our finding too.” So we ArXiv'd immediately. We concurrently uncovered a circular geometry of emotions organized by valence and arousal (VA), as well as steering effects on downstream behaviors like refusal and sycophancy. We further provide a mechanistic account for why: refusal and compliance tokens occupy distinct regions in this space. 1/

English
3
14
113
11.1K
Eric Todd retweetledi
NDIF
NDIF@ndif_team·
📣 Launching monthly interp puzzles 🧩 Each month: a model trained on a toy task. Your job: reverse-engineer the algorithm it learned. First puzzle: how does a 1-2L attn-only transformer find the max of a list? Starter Colab included. Deadline: April 30 puzzles.baulab.info
English
4
33
237
38.8K
Eric Todd retweetledi
David Bau
David Bau@davidbau·
Calling attention to an exciting "deception detection" hackathon we're planning this summer! w @NDIF and @CadenzaLabs. Recruiting red teams now, blue teams later. Red teams, time is short: proposals due Mar 31. $10K stipend + compute, $15K finals prize. nnsight.net/blog/2026/03/2…
English
2
18
59
6K
Eric Todd retweetledi
David Bau
David Bau@davidbau·
In 1982, high school students in Sudbury, Mass. wrote a dungeon game called Hack. They had Atari 800s and Logo and an obsession with a Unix game called Rogue that most of them had never seen. I grew up one town over with the same computers and the same obsession.
David Bau tweet media
English
1
6
18
1.8K
Eric Todd retweetledi
Rohit Gandikota
Rohit Gandikota@rohitgandikota·
I’ll be presenting our work, “Distilling Diversity and Control in Diffusion Models,” at @wacv_official this Sunday at 11 AM local time. 🔍We uncover the “secret to unlocking diversity” in diffusion models - using **interpretability**!! DM me if you’d like to connect in Tucson.
Rohit Gandikota@rohitgandikota

Why do distilled diffusion models generate similar-looking images? 🤔 Our Diffusion Target (DT) visualization reveals the secret to diversity. It is the very first time-step! And—there is a simple, training-free way to make them more diverse! Here is how: 🧵👇

English
0
4
23
2.1K
Eric Todd retweetledi
Jaden Fiotto-Kaufman
Jaden Fiotto-Kaufman@jadenfk23·
NNsight 0.6 is out now! We directly address your feedback in our biggest release yet. Pain points included cryptic errors, slow traces, no remote execution of custom code, and limited vLLM support. We tackle all of these and more in this new release. 🧵 Here's what changed:
English
1
10
38
7.9K