CogInterp Workshop @ NeurIPS 2025

40 posts

CogInterp Workshop @ NeurIPS 2025 banner
CogInterp Workshop @ NeurIPS 2025

CogInterp Workshop @ NeurIPS 2025

@CogInterp

At Upper Level Room 5AB of the conference venue!

San Diego, CA Katılım Temmuz 2025
0 Takip Edilen207 Takipçiler
Sabitlenmiş Tweet
CogInterp Workshop @ NeurIPS 2025
We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2025/ 1/
English
1
20
76
17.3K
CogInterp Workshop @ NeurIPS 2025 retweetledi
NYU Center for Data Science
NYU Center for Data Science@NYUDataScience·
Can LLMs evolve human-like semantic categories? CDS-affiliated @NogaZaslavsky and PhD student Nathaniel Imel show that, via simulated cultural transmission, LLMs reorganize color categories toward efficient compression. 🔗arxiv.org/abs/2509.08093
NYU Center for Data Science tweet media
English
2
4
29
8.9K
CogInterp Workshop @ NeurIPS 2025 retweetledi
Ari Holtzman
Ari Holtzman@universeinanegg·
this slide is solid gold
Ari Holtzman tweet media
Goodfire@GoodfireAI

Our last Stanford guest lecture - @EkdeepL on what counts as an explanation & a neuro-inspired "model systems approach" to interp Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach) 00:33 - What counts as an explanation? 04:47 - Levels of analysis & standard interpretability approaches 18:19 - The "model systems" approach to interp [Case study on in-context learning] 23:36 - How LLM representations change in-context 44:10 - Modeling ICL with rational analysis 1:10:54 - Conclusion & questions Thanks again to @SuryaGanguli for having us in his class!

English
2
4
52
6.3K
CogInterp Workshop @ NeurIPS 2025 retweetledi
Goodfire
Goodfire@GoodfireAI·
Our last Stanford guest lecture - @EkdeepL on what counts as an explanation & a neuro-inspired "model systems approach" to interp Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach) 00:33 - What counts as an explanation? 04:47 - Levels of analysis & standard interpretability approaches 18:19 - The "model systems" approach to interp [Case study on in-context learning] 23:36 - How LLM representations change in-context 44:10 - Modeling ICL with rational analysis 1:10:54 - Conclusion & questions Thanks again to @SuryaGanguli for having us in his class!
English
3
27
138
31.3K
CogInterp Workshop @ NeurIPS 2025 retweetledi
Christopher Potts
Christopher Potts@ChrisGPotts·
Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:
English
6
19
168
16K
CogInterp Workshop @ NeurIPS 2025 retweetledi
Noga Zaslavsky
Noga Zaslavsky@NogaZaslavsky·
Honored and thrilled that our work received the @CogInterp best paper award! 💫 📄 Extended paper: arxiv.org/pdf/2509.08093 🧵 Highlights: x.com/NogaZaslavsky/… @NeurIPSConf #NeurIPS2025
CogInterp Workshop @ NeurIPS 2025@CogInterp

Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky @NogaZaslavsky for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!

English
2
5
36
4.1K
CogInterp Workshop @ NeurIPS 2025
Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky @NogaZaslavsky for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!
CogInterp Workshop @ NeurIPS 2025 tweet media
English
0
0
11
4.7K
CogInterp Workshop @ NeurIPS 2025
We are about to start our panel discussion, join us for some hot takes about what cognitive interpretability should be about.
CogInterp Workshop @ NeurIPS 2025 tweet media
English
0
0
7
337
CogInterp Workshop @ NeurIPS 2025
Jay proposes shifting from representing context as a sequence of tokens to a sequence of thoughts. The model learns a latent 'thought gestalt' from previous sentences to guide downstream prediction.
CogInterp Workshop @ NeurIPS 2025 tweet mediaCogInterp Workshop @ NeurIPS 2025 tweet media
English
0
0
4
257
CogInterp Workshop @ NeurIPS 2025
Visualizing how LLMs handle object-property binding, he argues that even with scale, transformers might not be forming the kind of 'integrated representations' that human cognition relies on.
CogInterp Workshop @ NeurIPS 2025 tweet mediaCogInterp Workshop @ NeurIPS 2025 tweet media
English
1
0
1
273
CogInterp Workshop @ NeurIPS 2025
Jay McClelland, opens with a question, "Do LMs have thoughts?" Are LMs stochastic parrots or is there some understanding?
CogInterp Workshop @ NeurIPS 2025 tweet media
English
3
1
18
7.9K
CogInterp Workshop @ NeurIPS 2025
In our fourth spotlight talk, neural network legend Paul Smolensky uses symbolic programs such as production systems to understand how neural networks process symbols
CogInterp Workshop @ NeurIPS 2025 tweet media
English
0
2
21
2.8K
CogInterp Workshop @ NeurIPS 2025
For our third spotlight talk, Sonia Murthy @soniakmurthy uses probabilistic cognitive models to understand value trade-offs in LLMs that enable pragmatic reasoning about politeness in speech acts
CogInterp Workshop @ NeurIPS 2025 tweet media
English
0
0
3
165
CogInterp Workshop @ NeurIPS 2025
Erin Grant @ermgrant discusses dissociations between function and representation, and asks whether representational alignment is enough for understanding deep neural networks
CogInterp Workshop @ NeurIPS 2025 tweet media
English
1
0
10
453
CogInterp Workshop @ NeurIPS 2025 retweetledi
Sonia Murthy
Sonia Murthy@soniakmurthy·
Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666
CogInterp Workshop @ NeurIPS 2025@CogInterp

The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at coginterp.github.io/neurips2025/ac… (3/3)

English
0
2
8
936