Connor Dilgren

8 posts

Connor Dilgren banner
Connor Dilgren

Connor Dilgren

@ConnorDilgren

CS MS student @umdcs

เข้าร่วม Şubat 2014
140 กำลังติดตาม72 ผู้ติดตาม
Connor Dilgren
Connor Dilgren@ConnorDilgren·
Overall, these results are somewhat encouraging for latent reasoning model interpretability. But I suspect models with weaker natural language priors, such as those trained to do latent reasoning during pretraining or through RL, will be much less interpretable.
English
1
0
6
571
Connor Dilgren
Connor Dilgren@ConnorDilgren·
Excited to announce my first preprint in LM interpretability! Latent reasoning models are not monitorable by default, since they don't reason in human-readable, natural language text. But can we make progress in understanding their intermediate reasoning steps using mech interp?
Connor Dilgren tweet media
English
7
29
200
12.5K