Alexa R. Tartaglini

67 posts

Alexa R. Tartaglini banner
Alexa R. Tartaglini

Alexa R. Tartaglini

@ARTartaglini

CS PhD student @Stanford @stanfordnlp // Interested in interpretability, cognition, & more esoteric things (prev. @NYUDataScience)

Stanford, CA Katılım Mayıs 2021
585 Takip Edilen681 Takipçiler
Sabitlenmiş Tweet
Alexa R. Tartaglini
Alexa R. Tartaglini@ARTartaglini·
I’m super excited to announce that I’ll be starting a PhD in Computer Science at @Stanford this upcoming fall! Stay posted for (hopefully) cool work on visual abstraction, emergence of symbols in DNNs, & cognitively-inspired interpretability 🟦🔴
English
19
3
228
28.6K
Alexa R. Tartaglini retweetledi
Qinan Yu
Qinan Yu@qinan_yu·
1/8 RLVR improves accuracy but does not always lead to causal and verifiable CoTs. Surprisingly, this happens even on reasoning-intensive tasks! But we can fix this with reward shaping and SFT-before-RL.
Qinan Yu tweet media
English
3
38
243
33.9K
Alexa R. Tartaglini retweetledi
Alexa R. Tartaglini retweetledi
Michael Lepori
Michael Lepori@Michael_Lepori·
🚨New preprint! In-context learning underlies LLMs’ real-world utility, but what are its limits? Can LLMs learn completely novel representations in-context and flexibly deploy them to solve tasks? In other words, can LLMs construct an in-context world model? Let’s see! 👀
Michael Lepori tweet media
English
2
15
142
12.8K
Alexa R. Tartaglini retweetledi
Zora Wang
Zora Wang@ZhiruoW·
‼️Position: AI coding agent research needs recalibration. We've heavily optimized for solo autonomy, and far less for designing agents that empower the humans using them. It’s time to build human-centered coding agents. 🧵
Zora Wang tweet media
English
26
64
318
50.1K
Alexa R. Tartaglini retweetledi
Griffiths Computational Cognitive Science Lab
Excited to announce a new book telling the story of mathematical approaches to studying the mind, from the origins of cognitive science to modern AI! The Laws of Thought will be published in February, and is available for pre-order now.
Griffiths Computational Cognitive Science Lab tweet media
English
37
260
1.7K
95.6K
Alexa R. Tartaglini
Alexa R. Tartaglini@ARTartaglini·
I’ll be at #NeurIPS2025 — excited to talk about (cognitive) interpretability and multimodality! Unrelatedly, I’ll be posting about a new paper of mine soon…
English
0
2
15
1K
Alexa R. Tartaglini retweetledi
Satchel Grant
Satchel Grant@satchelgrant·
1/8 New preprint! We show many interp methods (patching, SAEs, DAS) can push models off their natural manifold. This can be harmless or can activate hidden circuits. We provide a mitigating solution making interventions less divergent. If you care about reliable interp, read on!
Satchel Grant tweet media
English
5
61
431
38.3K
Alexa R. Tartaglini retweetledi
Andrew Lampinen
Andrew Lampinen@AndrewLampinen·
In neuroscience, we often try to understand systems by analyzing their representations — using tools like regression or RSA. But are these analyses biased towards discovering a subset of what a system represents? If you're interested, check out our new commentary! Thread:
Andrew Lampinen tweet media
English
5
65
361
34K
Rohan Pandey
Rohan Pandey@khoomeik·
note to self: don’t wear the @PrimeIntellect ai futures compass tshirt to a stanford party people will force you to stand still so they can read every quadrant also shoutout @simonguozirui goated film camera
Rohan Pandey tweet mediaRohan Pandey tweet media
English
18
8
358
29.3K
Adam Shai
Adam Shai@adamimos·
Looking forward to my first ever Neurips! If you're interested in mech. interp., theory of neural nets, OOD generalization, what abstraction and reasoning could mean in LLMs, neuro-AI, etc. or have seen my work and think its cool I'd love to meet up! DMs open.
English
1
0
8
340
Guy Davidson
Guy Davidson@guyd33·
Friends and frenemies! I'll be at #NeurIPS2024 all week! Let's grab coffee/food and talk about goal generation and representation, how to think about goals with/in/for LLMs, cognitive science in 2025+, or give me advice for getting into (mech.) interpretability research!
English
3
5
50
6.2K
Alexa R. Tartaglini
Alexa R. Tartaglini@ARTartaglini·
I'll be attending @NeurIPSConf from December 10-15 and presenting this work with @Michael_Lepori. Please reach out if you'd like to chat about interpretability, cognition, or anything really! #NeurIPS2024
Alexa R. Tartaglini@ARTartaglini

🚨 New paper at @NeurIPSConf w/ @Michael_Lepori! Most work on interpreting vision models focuses on concrete visual features (edges, objects). But how do models represent abstract visual relations between objects? We adapt NLP interpretability techniques for ViTs to find out! 🔍

English
0
1
31
3.5K
Alexa R. Tartaglini
Alexa R. Tartaglini@ARTartaglini·
⭐ We find that these regularizers work, improving test accuracy of from-scratch ViTs on our same-different tasks by up to 44 points! We believe this technique may be useful for learning a wide range of abstract visual relations with limited training data.
Alexa R. Tartaglini tweet media
English
1
0
3
1.1K
Alexa R. Tartaglini
Alexa R. Tartaglini@ARTartaglini·
🚨 New paper at @NeurIPSConf w/ @Michael_Lepori! Most work on interpreting vision models focuses on concrete visual features (edges, objects). But how do models represent abstract visual relations between objects? We adapt NLP interpretability techniques for ViTs to find out! 🔍
Alexa R. Tartaglini tweet media
English
2
36
257
31.5K