
An overview of our approach - the behavior stays fixed while the searched representation changes. The base pass applies attribution, mask selection, and causal recovery to the original model. Local-interface analysis finds compact pieces that do not compose by themselves. However the conditioning approach applies a constrained low-rank update, then reruns the same recovery test to ask whether the existing capability becomes extractable.









