Ali Bahrainian retweetledi

1/ Controlling LLMs with steering vectors is unreliable, but why? Our paper, "Understanding (Un)Reliability of Steering Vectors in Language Models," at the #ICLR2025 @FM_in_Wild Workshop investigates this! What did we find?
English
















