Arnab Sen Sharma

67 posts

Arnab Sen Sharma

Arnab Sen Sharma

@arnab_api

Ph.D. student @KhouryCollege, working to make LLMs interpretable

Boston, MA Sumali Eylรผl 2022
174 Sinusundan227 Mga Tagasunod
Naka-pin na Tweet
Arnab Sen Sharma
Arnab Sen Sharma@arnab_apiยท
How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! ๐Ÿงต
Arnab Sen Sharma tweet media
English
1
21
66
11.4K
Arnab Sen Sharma nag-retweet
Eric Todd
Eric Todd@ericwtoddยท
Can you solve this algebra puzzle? ๐Ÿงฉ cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:๐Ÿงตโฌ‡๏ธ
Eric Todd tweet media
English
9
50
320
54.9K
Arnab Sen Sharma nag-retweet
Koyena Pal
Koyena Pal@kpal_koyenaยท
Can models understand each other's reasoning? ๐Ÿค” When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way? Our new preprint with @davidbau and @csinva explores CoT generalizability ๐Ÿงต๐Ÿ‘‡ (1/7)
Koyena Pal tweet media
English
7
24
208
24.7K
Arnab Sen Sharma nag-retweet
David Bau
David Bau@davidbauยท
At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: davidbau.com/archives/2025/โ€ฆ
David Bau tweet media
English
24
99
551
106.6K
Arnab Sen Sharma nag-retweet
Chris Wendler
Chris Wendler@wendlerchยท
I am very excited to share that our paper, "One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models" will be presented at #NeurIPS2025! @ViaSurkov is presenting it at #MexIPS2025: ๐Ÿ“๐ˆ๐Ÿ ๐ฒ๐จ๐ฎ ๐š๐ซ๐ž ๐š๐ญ๐ญ๐ž๐ง๐๐ข๐ง๐  ๐๐ž๐ฎ๐ซ๐ˆ๐๐’ ๐ข๐ง ๐Œ๐ž๐ฑ๐ข๐œ๐จ ๐‚๐ข๐ญ๐ฒ, ๐ฉ๐ฅ๐ž๐š๐ฌ๐ž ๐ฌ๐ญ๐จ๐ฉ ๐›๐ฒ! Date: Thursday, Dec 4, 2025 Time: 11:00 AM โ€“ 2:00 PM PST Location: Foyer (Mexico City Poster Session) Come visit @ViaSurkov it's his first conference and he will be happy to explain his amazing work. Sadly, #NeurIPS2025 does not allow for parallel presentation in San Diego. However, I am in San Diego and happy to meet up / chat. Please don't hesitate to reach out here or via ch.wendler@northeastern.edu. Once again, a big shout out to our brilliant students Viacheslav Surkov and Antonio Mari who did phenomenal work here and pushed this work (that started as a class project more than a year ago) all the way to pass the high threshold of #NeurIPS2025. Also, I want to thank manifund.org (@andyarditi and @ryan_kidd44 in particular) for helping us to finance Viacheslav Surkov's conference trip. Please find more information about our work below. We have so many amazing interactive materials (e.g., 3x huggingface demo spaces) for you to check out. Most of our implementations are open-sourced (RIEBench on FLUX, which we added to our appendix during the NeurIPS rebuttal is currently missing but we plan to add it ASAP). Me demoing the demo attached.
Chris Wendler@wendlerch

How do diffusion models create images and can we control that process? We are excited to release a update to our SDXL Turbo sparse autoencoder paper. New title: One Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models Spoiler: We have FLUX SAEs now :)

English
0
12
77
11.7K
Arnab Sen Sharma nag-retweet
Tamar Rott Shaham
Tamar Rott Shaham@TamarRottShahamยท
A key challenge for interpretability agents is knowing when theyโ€™ve understood enough to stop experimenting. Our @NeurIPSConf paper introduces a self-reflective agent that measures the reliability of its own explanations and stops once its understanding of models has converged.
Tamar Rott Shaham tweet media
English
2
26
53
8.7K
Arnab Sen Sharma
Arnab Sen Sharma@arnab_apiยท
The fact that the neural mechanisms implemented in transformer architecture align with human-designed symbolic strategies suggests that certain computational patterns rise naturally from task demands rather than specific architectural constraints.
English
1
0
5
264
Arnab Sen Sharma
Arnab Sen Sharma@arnab_apiยท
How can a language model find the veggies in a menu? New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options. Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! ๐Ÿงต
Arnab Sen Sharma tweet media
English
1
21
66
11.4K
Arnab Sen Sharma nag-retweet
David Bau
David Bau@davidbauยท
Who is going to be at #COLM2025? I want to draw your attention to a COLM paper by my student @sheridan_feucht that has totally changed the way I think and teach about LLM representations. The work is worth knowing. And you meet Sheridan at COLM, Oct 7!
David Bau tweet media
Sheridan Feucht@sheridan_feucht

[๐Ÿ“„] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

English
3
32
192
31.6K
Arnab Sen Sharma nag-retweet
Nikhil Prakash
Nikhil Prakash@nikhil07prakashยท
How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it relies heavily on concepts similar to pointer variables in C programming!
Nikhil Prakash tweet media
English
12
94
578
111.7K