Sonya Huang 🐥@sonyatweetybird
Can we map the mind of an LLM? Our first mechanistic interpretability episode on Training Data featuring @GoodfireAI founder @ericho_goodfire (and our first cameo from @roelofbotha!)
Goodfire is building an independent mech interp lab, led by some heavyweight researchers from the field (e.g. @leedsharkey who has led a lot of important work in sparse autoencoders to "unscramble" LLMs and resolve superposition, @nickcammarata who has been a key pioneer behind auto interpretability)
On this episode, Eric gives us a flyover of the technical results so far from this nascent field (universality, superposition), what's ahead in the research (going from circuits to weights, going from understanding to increasingly surgical editing), a preview of the real-world work they're doing already with @arcinstitute, and the impact he expects Goodfire and the broader field to have on steering, safety, editing and more.