Athulith Paraselli

1 posts

Athulith Paraselli

Athulith Paraselli

@pathulith22

Master’s student @BrownUniversity researching mechanistic interpretability to build safer, controllable AI Previously studied Math-CS @UCSanDiego

انضم Eylül 2025
79 يتبع3 المتابعون
Athulith Paraselli
Athulith Paraselli@pathulith22·
@johnhewtt Thanks for the informative post! I was curious about how you navigate this lack of linearity in practice. When utilizing tools like logit lens or steering vectors, are there specific 'anomalies' or failure modes you look for that suggest the approximation fails?
English
1
0
0
196
John Hewitt
John Hewitt@johnhewtt·
Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/resid…
English
5
17
232
11.9K