Alexa R. Tartaglini

67 posts

Alexa R. Tartaglini

@ARTartaglini

CS PhD student @Stanford @stanfordnlp // Interested in interpretability, cognition, & more esoteric things (prev. @NYUDataScience)

Stanford, CA Katılım Mayıs 2021

585 Takip Edilen681 Takipçiler

Sabitlenmiş Tweet

Alexa R. Tartaglini@ARTartaglini·18 Nis

I’m super excited to announce that I’ll be starting a PhD in Computer Science at @Stanford this upcoming fall! Stay posted for (hopefully) cool work on visual abstraction, emergence of symbols in DNNs, & cognitively-inspired interpretability 🟦🔴

English

228

28.6K

Alexa R. Tartaglini retweetledi

Qinan Yu@qinan_yu·30 Nis

1/8 RLVR improves accuracy but does not always lead to causal and verifiable CoTs. Surprisingly, this happens even on reasoning-intensive tasks! But we can fix this with reward shaping and SFT-before-RL.

English

243

33.9K

Alexa R. Tartaglini retweetledi

Satchel Grant@satchelgrant·31 Mar

Causally intervening on your feed to share: this work got accepted to ICLR for oral presentation! 🎉 Thanks to my amazing coauthors @ChrisGPotts @ARTartaglini @sjeromehan and everyone who engaged with the preprint 🙏

Satchel Grant@satchelgrant

1/8 New preprint! We show many interp methods (patching, SAEs, DAS) can push models off their natural manifold. This can be harmless or can activate hidden circuits. We provide a mitigating solution making interventions less divergent. If you care about reliable interp, read on!

English

135

13.6K

Alexa R. Tartaglini retweetledi

Michael Lepori@Michael_Lepori·26 Şub

🚨New preprint! In-context learning underlies LLMs’ real-world utility, but what are its limits? Can LLMs learn completely novel representations in-context and flexibly deploy them to solve tasks? In other words, can LLMs construct an in-context world model? Let’s see! 👀

English

142

12.8K

Alexa R. Tartaglini retweetledi

Zora Wang@ZhiruoW·11 Şub

‼️Position: AI coding agent research needs recalibration. We've heavily optimized for solo autonomy, and far less for designing agents that empower the humans using them. It’s time to build human-centered coding agents. 🧵

English

318

50.1K

Alexa R. Tartaglini retweetledi

Kat ⊷ the Poet Engineer@poetengineer__·22 Ara

the encoder

English

315

2.8K

103.7K

Alexa R. Tartaglini retweetledi

Griffiths Computational Cognitive Science Lab@cocosci_lab·18 Ara

Excited to announce a new book telling the story of mathematical approaches to studying the mind, from the origins of cognitive science to modern AI! The Laws of Thought will be published in February, and is available for pre-order now.

Griffiths Computational Cognitive Science Lab tweet media

English

260

1.7K

95.6K

Alexa R. Tartaglini@ARTartaglini·3 Ara

I’ll be at #NeurIPS2025 — excited to talk about (cognitive) interpretability and multimodality! Unrelatedly, I’ll be posting about a new paper of mine soon…

English

Alexa R. Tartaglini retweetledi

Satchel Grant@satchelgrant·2 Ara

English

431

38.3K

Alexa R. Tartaglini retweetledi

Christopher Potts@ChrisGPotts·9 Kas

I am delighted to be collaborating with both Alexa (@ARTartaglini) and Siri (@srihita_raju).

English

3.6K

Alexa R. Tartaglini retweetledi

Andrew Lampinen@AndrewLampinen·5 Ağu

In neuroscience, we often try to understand systems by analyzing their representations — using tools like regression or RSA. But are these analyses biased towards discovering a subset of what a system represents? If you're interested, check out our new commentary! Thread:

English

361

34K

Alexa R. Tartaglini@ARTartaglini·23 Haz

@vincentweisser @khoomeik @PrimeIntellect @simonguozirui @Muennighoff @boson2photon @madisenxtaylor Omg please do!!

English

153

Vincent Weisser@vincentweisser·23 Haz

@ARTartaglini @khoomeik @PrimeIntellect @simonguozirui @Muennighoff @boson2photon @madisenxtaylor sending you one 🫡

English

148

Rohan Pandey@khoomeik·23 Haz

note to self: don’t wear the @PrimeIntellect ai futures compass tshirt to a stanford party people will force you to stand still so they can read every quadrant also shoutout @simonguozirui goated film camera

English

358

29.3K

Alexa R. Tartaglini@ARTartaglini·23 Haz

@khoomeik @PrimeIntellect @simonguozirui @Muennighoff @boson2photon @madisenxtaylor I would unironically wear this

English

176

Rohan Pandey@khoomeik·23 Haz

@ARTartaglini @PrimeIntellect @simonguozirui @Muennighoff @boson2photon omg get this woman some merch @madisenxtaylor

English

429

Alexa R. Tartaglini@ARTartaglini·10 Ara

@adamimos Sounds interesting!

English

168

Adam Shai@adamimos·9 Ara

Looking forward to my first ever Neurips! If you're interested in mech. interp., theory of neural nets, OOD generalization, what abstraction and reasoning could mean in LLMs, neuro-AI, etc. or have seen my work and think its cool I'd love to meet up! DMs open.

English

340

Alexa R. Tartaglini@ARTartaglini·10 Ara

@guyd33 No way, I didn’t know you were into mechinterp! We should chat!

English

119

Guy Davidson@guyd33·9 Ara

Friends and frenemies! I'll be at #NeurIPS2024 all week! Let's grab coffee/food and talk about goal generation and representation, how to think about goals with/in/for LLMs, cognitive science in 2025+, or give me advice for getting into (mech.) interpretability research!

English

6.2K

Alexa R. Tartaglini@ARTartaglini·5 Ara

@aryaman2020 I want to hear this (even though I'm not the target audience)

English

218

Aryaman Arora@aryaman2020·5 Ara

not sure if there is demand for this but I am down to pitch to Anthropic researchers why they should leave to join academia

Trenton Bricken@TrentonBricken

Going to be an NeurIPS next week! I'm particularly keen to chat with researchers who need a nudge to leave academia and join Anthropic, esp to work on safety

English

135

22.2K

Alexa R. Tartaglini@ARTartaglini·5 Ara

I'll be attending @NeurIPSConf from December 10-15 and presenting this work with @Michael_Lepori. Please reach out if you'd like to chat about interpretability, cognition, or anything really! #NeurIPS2024

Alexa R. Tartaglini@ARTartaglini

🚨 New paper at @NeurIPSConf w/ @Michael_Lepori! Most work on interpreting vision models focuses on concrete visual features (edges, objects). But how do models represent abstract visual relations between objects? We adapt NLP interpretability techniques for ViTs to find out! 🔍

English

3.5K

Alexa R. Tartaglini@ARTartaglini·23 Kas

@hannahrosekirk @NeurIPSConf @AISafetyInst ✋🏻

QME

Hannah Rose Kirk@hannahrosekirk·21 Kas

Research Question 1: Who's going to @NeurIPSConf ? Research Question 2: Who wants to come to the inaugural @AISafetyInst party? 👀

English

215

23.2K

Alexa R. Tartaglini@ARTartaglini·22 Kas

Want to learn more? Check out our full paper (below) and talk to my co-author @Michael_Lepori and I at our @NeurIPSConf poster on Friday 12/13 at 4:30pm PST! Shoutout to our all-star advising team: @wkvong, @tserre, @LakeBrenden, and @Brown_NLP. arxiv.org/abs/2406.15955

English

963

Alexa R. Tartaglini@ARTartaglini·22 Kas

⭐ We find that these regularizers work, improving test accuracy of from-scratch ViTs on our same-different tasks by up to 44 points! We believe this technique may be useful for learning a wide range of abstract visual relations with limited training data.

English

1.1K

Alexa R. Tartaglini@ARTartaglini·22 Kas

English

257

31.5K

Keşfet

@ChrisGPotts @sjeromehan @srihita_raju @vincentweisser @khoomeik @PrimeIntellect @simonguozirui @Muennighoff