Connor Dilgren

@ConnorDilgren

CS MS student @umdcs

เข้าร่วม Şubat 2014

140 กำลังติดตาม72 ผู้ติดตาม

Connor Dilgren@ConnorDilgren·1d

Thanks to @sarahwiegreffe for advising this project! Read the preprint here: arxiv.org/pdf/2604.04902

English

477

Connor Dilgren@ConnorDilgren·1d

Overall, these results are somewhat encouraging for latent reasoning model interpretability. But I suspect models with weaker natural language priors, such as those trained to do latent reasoning during pretraining or through RL, will be much less interpretable.

English

571

Connor Dilgren@ConnorDilgren·1d

Excited to announce my first preprint in LM interpretability! Latent reasoning models are not monitorable by default, since they don't reason in human-readable, natural language text. But can we make progress in understanding their intermediate reasoning steps using mech interp?

English

200

12.5K

ค้นพบ

@sarahwiegreffe @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine