Roy Mayan retweetledi
Roy Mayan
4 posts

Roy Mayan retweetledi
Roy Mayan retweetledi

How can we interpret LLM features at scale? 🤔
Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs!
We propose efficient output-centric methods that better predict how steering a feature will affect model outputs.
New preprint led by my student @GurYoav with dream team @Roym4498, Chen Agassy, and Atticus Geiger 🧵1/
GIF
English


