Abhishek Shetty

116 posts

Abhishek Shetty

Abhishek Shetty

@AShettyV

Incoming Asst prof at @gatech_scs FODSI Postdoctoral Fellow @MIT PhD from @Berkeley_EECS; Ex: Microsoft Research, Apple Apple AI/ML Research Fellow 2023

Cambridge, MA Katılım Şubat 2020
1.5K Takip Edilen596 Takipçiler
Abhishek Shetty retweetledi
Gavin Brown
Gavin Brown@gavinrbrown1·
Gradient descent does not work. I will die on this hill.
English
244
340
5.3K
336.6K
Abhishek Shetty
Abhishek Shetty@AShettyV·
This is a line of work I am super excited about. We bring a principled lens towards understanding LLMs, leading to both interesting theoretical abstractions and surprising experiments. Do check this out at ICLR and ask @GolowichNoah about all the cool consequences of this lens.
Noah Golowich@GolowichNoah

In this paper, we study the "extended logit matrix" corresponding to an LLM, a sort of multi-token variant of the well-studied "logit matrix": its rows and columns are indexed by sequences and its entries are determined by the LLM's log-probs on the corresponding sequences. 2/n

English
1
1
21
4.2K
Abhishek Shetty retweetledi
Abhishek Shetty retweetledi
Noah Golowich
Noah Golowich@GolowichNoah·
We also have follow-up work showing that distributions whose extended logit matrices are well-approximated by low-rank ones according to power laws (as in the above plot) are provably efficiently learnable using queries: arxiv.org/pdf/2512.09892… 5/n
English
1
2
3
901
Abhishek Shetty retweetledi
Abhishek Shetty retweetledi
Noah Golowich
Noah Golowich@GolowichNoah·
We observe that such matrices for modern LLMs are 'close to low-rank' (see fig below showing approximation error for various ranks). This has numerous intriguing consequences, such as the ability to generate from a given prompt by only querying the LLM on *unrelated* prompts. 3/n
Noah Golowich tweet media
English
1
2
11
1.1K
Abhishek Shetty retweetledi
Noah Golowich
Noah Golowich@GolowichNoah·
In this paper, we study the "extended logit matrix" corresponding to an LLM, a sort of multi-token variant of the well-studied "logit matrix": its rows and columns are indexed by sequences and its entries are determined by the LLM's log-probs on the corresponding sequences. 2/n
English
2
2
7
6.5K
Abhishek Shetty retweetledi
Noah Golowich
Noah Golowich@GolowichNoah·
Excited about a couple of papers of ours in ICLR this year (both in Poster Session 1 Pavilion 3 & Oral Session 2B tomorrow): (1) Sequences of Logits Reveal the Low-Rank Structure of Language Models (joint w/ @axliu42 & @AShettyV) arxiv.org/pdf/2510.24966. 1/n
English
1
7
59
5K
Abhishek Shetty retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
We also ran replications on Gemma, increased sample sizes, and made a variety of improvements to the experiments, presentation, and writing. We’ve been excited to see research from other groups related to subliminal learning. Here’s a few highlights:
Owain Evans tweet media
English
3
7
64
6.3K
Abhishek Shetty retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵
Owain Evans tweet media
English
42
140
892
517.2K
Abhishek Shetty retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
General misalignment can also be learned subliminally. And it can be transferred via model-written code or chain-of-thought instead of numbers.
Owain Evans tweet media
English
1
8
97
9.2K
Abhishek Shetty retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
Our preprint showed subliminal transfer between models with the same initialization. Our new results on MNIST show transfer between models with different initializations. This is a toy model but still expands the scope of the effect.
Owain Evans tweet media
English
2
5
80
6.9K
Abhishek Shetty retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
Aden-Ali et al (2026) showed that traits can be transferred via *standard* post-training datasets by filtering those datasets with the teacher. These traits include animal preferences and misalignment.
English
1
3
42
4.3K
Abhishek Shetty retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
Draganov et al (2026) demonstrated “phantom transfer” as a data poisoning attack. With a setup similar to ours, they show transfer of traits between different model families. This transfer is difficult to stop — various defenses fail.
Owain Evans tweet media
English
2
7
74
6K
Abhishek Shetty retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
if the low-rank logits really holds across settings, i expect it should have a lot of downstream corollaries & connections waiting to be discovered
English
0
1
3
333
Abhishek Shetty retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
i also like the low-rank logits finding (arxiv.org/abs/2510.24966) bc it provides a novel abstraction to think about what function a trained LLM implements. (it actually took me a while to understand and appreciate and buy the exact result)
English
1
1
3
399
Abhishek Shetty retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
incredibly, you can select these datapoints through a straightforward method: see whether the given preference is aligned with a model prompted with the target behavior. (i'd have expected that you'd need an exponential search over all possible data subsets to accomplish this)
English
1
1
2
351