Connor Dilgren

8 posts

Connor Dilgren banner
Connor Dilgren

Connor Dilgren

@ConnorDilgren

CS MS student @umdcs

شامل ہوئے Şubat 2014
140 فالونگ66 فالوورز
Connor Dilgren
Connor Dilgren@ConnorDilgren·
Overall, these results are somewhat encouraging for latent reasoning model interpretability. But I suspect models with weaker natural language priors, such as those trained to do latent reasoning during pretraining or through RL, will be much less interpretable.
English
1
0
6
552
Connor Dilgren
Connor Dilgren@ConnorDilgren·
Excited to announce my first preprint in LM interpretability! Latent reasoning models are not monitorable by default, since they don't reason in human-readable, natural language text. But can we make progress in understanding their intermediate reasoning steps using mech interp?
Connor Dilgren tweet media
English
7
29
192
11.1K