tom white
4.2K posts

tom white
@dribnet
creations with code and networks
Wellington, New Zealand Katılım Haziran 2011
4.2K Takip Edilen11K Takipçiler

Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility.
cs.columbia.edu/~johnhew/resid…
English

@RT_Artwork Got this too. 5 minutes before call they ask you to use the riverside client and forward you to a website clone (riverside dot name - BEWARE) with their malware installer (I stopped there).
(would be up for a themed exhibit on scams showcasing all artists with this invite! 😂)
English

@NeelNanda5 Understandable - though the distribution shift implies not all Gemma 3 concepts will be represented in these SAEs. Were there any others such as language filtering?
Might be worth updating the technical paper which seems to be misleading on this point.

English

I'm excited to release Gemma Scope 2: a comprehensive set of interpretability tools on Gemma 3. SAEs & transcoders on every layer of every model!
Gemma 3 27B shows lots of rich safety-relevant behaviour I want to enable deep dives into what's really going on
Check out our demo!

Callum McDougall@calsmcdougall
Google DeepMind is releasing Gemma Scope 2: SAEs and transcoders on every layer of every Gemma 3 model, 270M-27B, base & chat. We hope this enables deep dives into complex model behavior, for more ambitious open source safety & interpretability work!
English

@GaryMarcus @Ted_Underwood always felt you were wrong about this, but ChatGPT in fact completely agrees with you and helped me to better understand and appreciate your point here 👍


English

@dribnet @Ted_Underwood one of key examples in 2001 was object permanence; we gave examples of an object permanent failure today, two decades later, even though GPT has a trillion times more data and compute. grammaticality has improved, via pastische, but conceptual understanding has not.
English
























