Daking Rai

56 posts

Daking Rai

@DakingRai

CS PhD Student @GeorgeMasonU

Fairfax, Virginia Katılım Eylül 2014

364 Takip Edilen220 Takipçiler

Daking Rai@DakingRai·30 Kas

I’m actively looking for Summer 2026 internships focused on language model interpretability and methods to improve model reasoning and controllability. I’m also attending @NeurIPSConf —would love to connect! Resume & details: dakingrai.github.io

English

5.6K

Daking Rai retweetledi

XLLM-Reason-Plan@XllmReasonPlan·10 Eki

@COLM_conf #COLM2025 Prof. Greg Durrett presenting "LLM Reasoning Beyond Scaling" @gregd_nlp

English

800

Daking Rai retweetledi

XLLM-Reason-Plan@XllmReasonPlan·10 Eki

@COLM_conf #COLM2025 Prof. Yonatan Belinkov talking about "Toward Scalable and Actionable Interpretability"! @belinkov

English

10.6K

Daking Rai retweetledi

XLLM-Reason-Plan@XllmReasonPlan·10 Eki

🚨XLLM-Reason-Plan Workshop is happening right now at @COLM_conf ! Join us at 520F🙌

English

1.4K

Daking Rai retweetledi

XLLM-Reason-Plan@XllmReasonPlan·2 Eki

⏰ Only 9 days away! Join us at @COLM_conf on October 10 for the first workshop on the application of LLM explainability to reasoning and planning. Featuring: 📑 20 poster presentations 🎤 9 distinguished speakers View our schedule at tinyurl.com/xllm-workshop.

English

14.7K

Daking Rai@DakingRai·19 Eyl

(9/9) 🙏 Thanks for reading this far! If you found this interesting, be sure to check out the full paper, and feel free to contact me with any questions or clarifications. A huge thanks to my advisor @ziyuyao and collaborators Samuel Miller & @kevpmo — this work wouldn’t have been possible without them. Paper: arxiv.org/pdf/2507.00322

English

Daking Rai@DakingRai·19 Eyl

(8/9) RaSTEER generalizes to arithmetic reasoning. We consider a two-operand arithmetic reasoning task (+, -, %, x) using three models: GPT-2 XL, Pythia-6.9b, and GPT-3 8b. RaSTEER yields performance improvements across most arithmetic operations for all three models, with the largest gain of 20.25% observed in Pythia-6.9b for multiplication.

English

Daking Rai@DakingRai·19 Eyl

🚨 New NeurIPS 2025 Paper 🚨 Does 0% accuracy mean a language model (LM) has no correct mechanism for the task? 🤔 We investigated this question on the balanced parentheses task and uncover surprising insights: 1️⃣ Even at 0%, models can contain mechanisms that solve the task with high accuracy—but they’re overshadowed by the faulty ones. 2️⃣ Building on this, we introduce RaSTEER to improve model performance by amplifying the output of sound mechanisms —yielding dramatic performance boosts of up to 100%. 🧵(1/9) Paper link: arxiv.org/pdf/2507.00322

English

1.2K

Daking Rai retweetledi

Ziyu Yao (Hiring Fall'26 PhDs)@ZiyuYao·18 Eyl

🎉Check out our recent papers accepted to #NeurIPS and #EMNLP on #MechInterp of LLMs (I'm hiring Fall'26 PhDs on this topic) #NeurIPS2025 Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones (arxiv.org/pdf/2507.00322 w/ @DakingRai, Sam Miller, @kevpmo) My recent favourite! We propose a new perspective of "top-down mechanism decomposition" to understand why LMs fail. Surprisingly, even when LMs fail, we can discover reliable internal mechanisms that can successfully solve the task, and we found that the model fails mostly because the faulty mechanisms overshadow the sound ones! Steering the sound mechanisms solves the problem. We proved the idea on a Code Generation task (balanced parentheses) and found it generalizes to Arithmetic Reasoning. #EMNLP2025 All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens (arxiv.org/pdf/2509.09650 w/ @siddarthpm1 @DakingRai @YilunZhou) We try to understand how LLMs calculate for a + b - c. Humans solve it following a compositional formulation: a+b first, and -c later. But LLMs do not necessarily do the same. We discover a highly faithful subgraph, where LLMs transfer all information to the last token position and complete all calculations there. Such a sparse and non-compositional subgraph surprisingly generalizes to multiple LLMs. #EMNLP2025 A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models (arxiv.org/pdf/2503.05613 w/ @DuMNCH Ninghao Liu, and their students Dong Shu, Xuansheng Wu, Haiyan Zhao, and my student @DakingRai) If you've enjoyed our recent survey and #ICML tutorial on Mech Interp (icml.cc/virtual/2025/t…), this survey will give you more details specifically about SAEs! A must-read;) #EMNLP2025 Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models (arxiv.org/pdf/2505.15634 led by @DuMNCH and his students Zihao Li and Xu Wang) We extract SAE features representing Verbal Process and Symbolic Process of an LLM's CoT reasoning and perform steering to enhance their effect. Congrats and thanks to all collaborators!

English

114

12.9K

Daking Rai@DakingRai·15 Eyl

Had a great time collaborating on this paper led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC . We study how LMs handle two- and three-operand arithmetic and discovered a highly faithful AF1 circuit that shows: 1️⃣ Early layers don’t do instance-specific computation 2️⃣ Few mid-layer attention heads transfer info from other token positions to the last token position 3️⃣ Final answer is computed only in later layers at the last token. Please refer to our paper for more: arxiv.org/abs/2509.09650

Siddarth Mamidanna@siddarthpm1

🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: arxiv.org/abs/2509.09650 Code: github.com/siddarth-pm/al…

English

191

Daking Rai retweetledi

Yilun Zhou@YilunZhou·14 Eyl

Thanks @rohanpaul_ai for featuring our EMNLP 2025 paper! Super-proud of the work, led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC! In short, we uncovered a quite surprising mechanism of LLM solving arithmetic, but stay tuned for our own explainer thread!

Rohan Paul@rohanpaul_ai

When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens spend a lot of layers just holding information and doing general setup. Then, in just 2 middle layers, they pass their information to the last token. After that, the last token finishes the calculation on its own and produces the answer. They built two techniques to test this, called Context-Aware Mean Ablation (CAMA) and Attention-Based Peeking (ABP). These methods let them force the model to only work in certain ways, so they could see which parts were essential. With these tools, they discovered a sparse circuit, which they call All-for-One (AF1). This circuit is surprisingly efficient: most of the network can wait, then only a couple of layers are needed to hand off information, and the final token does the job. This works really well on plain arithmetic like "42 + 20 - 15". But the shortcut fails if the problem is written as a word problem or inside Python code, because then the model also needs to understand language or programming context. In short, the big insight is that language models don’t spread math work across the whole sequence. Instead, they rely heavily on the last token, with just a brief moment of information passing from the earlier ones. ---- Paper – arxiv. org/abs/2509.09650 Paper Title: "All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens"

English

1.2K

Keşfet

@NeurIPSConf @COLM_conf @gregd_nlp @belinkov @ziyuyao @kevpmo @siddarthpm1 @YilunZhou