
Great work by @ghandeharioun and others in PAIR on a new interpretability framework: 🩺Patchscopes
Asma Ghandeharioun@ghandeharioun
🧵Can we “ask” an LLM to “translate” its own hidden representations into natural language? We propose 🩺Patchscopes, a new framework for decoding specific information from a representation by “patching” it into a separate inference pass, independently of its original context. 1/9
English

















