Connor Dilgren

@ConnorDilgren

CS MS student @umdcs

شامل ہوئے Şubat 2014

140 فالونگ66 فالوورز

Connor Dilgren@ConnorDilgren·1d

Thanks to @sarahwiegreffe for advising this project! Read the preprint here: arxiv.org/pdf/2604.04902

English

464

Connor Dilgren@ConnorDilgren·1d

Overall, these results are somewhat encouraging for latent reasoning model interpretability. But I suspect models with weaker natural language priors, such as those trained to do latent reasoning during pretraining or through RL, will be much less interpretable.

English

552

Connor Dilgren@ConnorDilgren·1d

Excited to announce my first preprint in LM interpretability! Latent reasoning models are not monitorable by default, since they don't reason in human-readable, natural language text. But can we make progress in understanding their intermediate reasoning steps using mech interp?

English

192

11.1K

دریافت کریں

@sarahwiegreffe @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine