Abhishek Shetty

116 posts

Abhishek Shetty

@AShettyV

Incoming Asst prof at @gatech_scs FODSI Postdoctoral Fellow @MIT PhD from @Berkeley_EECS; Ex: Microsoft Research, Apple Apple AI/ML Research Fellow 2023

Cambridge, MA Katılım Şubat 2020

1.5K Takip Edilen596 Takipçiler

Abhishek Shetty retweetledi

Gautam Kamath@thegautamkamath·12 May

Can AI do Theory? Some of my friends are hosting a workshop on this topic at #STOC2026 in Salt Lake City Speakers include Scott Aaronson, @CarinaLHong, @MarkSellke, David Woodruff, @SebastienBubeck, & Prabhakar Raghavan (@WittedNote) Call for posters ddl: 5/29 Check it out!

English

17.3K

Abhishek Shetty retweetledi

Gautam Kamath@thegautamkamath·11 May

Accepted papers for #COLT2026: learningtheory.org/colt2026/accep…

English

6.5K

Abhishek Shetty retweetledi

Gavin Brown@gavinrbrown1·6 May

Gradient descent does not work. I will die on this hill.

English

244

340

5.3K

336.6K

Abhishek Shetty@AShettyV·23 Nis

This is a line of work I am super excited about. We bring a principled lens towards understanding LLMs, leading to both interesting theoretical abstractions and surprising experiments. Do check this out at ICLR and ask @GolowichNoah about all the cool consequences of this lens.

Noah Golowich@GolowichNoah

In this paper, we study the "extended logit matrix" corresponding to an LLM, a sort of multi-token variant of the well-studied "logit matrix": its rows and columns are indexed by sequences and its entries are determined by the LLM's log-probs on the corresponding sequences. 2/n

English

4.2K

Abhishek Shetty retweetledi

Noah Golowich@GolowichNoah·23 Nis

(2) Our second paper is on the coverage principle, joint w Fan Chen, @auddery @SadhikaMalladi, Adam Block, @jordan_t_ash, Akshay Krishnamurthy and @canondetortugas. See Dylan's excellent thread (and Fan's talk tomorrow!): x.com/canondetortuga…

Dylan Foster 🐢@canondetortugas

The coverage principle: How pre-training enables post-training New preprint where we look at the mechanisms through which next-token prediction produces models that succeed at downstream tasks. The answer involves a metric we call the "coverage profile", not cross-entropy.

English

1.2K

Abhishek Shetty retweetledi

Noah Golowich@GolowichNoah·23 Nis

We also have follow-up work showing that distributions whose extended logit matrices are well-approximated by low-rank ones according to power laws (as in the above plot) are provably efficiently learnable using queries: arxiv.org/pdf/2512.09892… 5/n

English

901

Abhishek Shetty retweetledi

Noah Golowich@GolowichNoah·23 Nis

I'm particularly excited about a number of open questions here and corresponding avenues for follow-ups: e.g., joint w @AdenIshaq @axliu42 @AShettyV, Ankur Moitra & @nhaghtal we show how such properties of LLMs can lead to 'subliminal learning' 4/n: x.com/nhaghtal/statu…

Nika Haghtalab@nhaghtal

1/n Now that I have a bit more time, I wanted to share more about this paper and my own thought process. There’s a gap in how we talk about data and behavior in LLMs. On the one hand, we say “data is the driver” and try to interpret what human preferences the data is showing. On the other hand, we keep seeing learned behaviors that don’t seem to be present in the data – at least not in any human-readable way. Examples like subliminal transfer, covert malicious finetuning, and weird generalization have been quite intriguing for me. In this paper, we ask what’s really causing that, and whether there’s a simple, general mechanism behind it.

English

2.4K

Abhishek Shetty retweetledi

Noah Golowich@GolowichNoah·23 Nis

We observe that such matrices for modern LLMs are 'close to low-rank' (see fig below showing approximation error for various ranks). This has numerous intriguing consequences, such as the ability to generate from a given prompt by only querying the LLM on *unrelated* prompts. 3/n

English

1.1K

Abhishek Shetty retweetledi

Noah Golowich@GolowichNoah·23 Nis

English

6.5K

Abhishek Shetty retweetledi

Noah Golowich@GolowichNoah·23 Nis

Excited about a couple of papers of ours in ICLR this year (both in Poster Session 1 Pavilion 3 & Oral Session 2B tomorrow): (1) Sequences of Logits Reveal the Low-Rank Structure of Language Models (joint w/ @axliu42 & @AShettyV) arxiv.org/pdf/2510.24966. 1/n

English

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

We also ran replications on Gemma, increased sample sizes, and made a variety of improvements to the experiments, presentation, and writing. We’ve been excited to see research from other groups related to subliminal learning. Here’s a few highlights:

English

6.3K

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵

English

140

892

517.2K

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

General misalignment can also be learned subliminally. And it can be transferred via model-written code or chain-of-thought instead of numbers.

English

9.2K

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

Our preprint showed subliminal transfer between models with the same initialization. Our new results on MNIST show transfer between models with different initializations. This is a toy model but still expands the scope of the effect.

English

6.9K

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

Aden-Ali et al (2026) showed that traits can be transferred via *standard* post-training datasets by filtering those datasets with the teacher. These traits include animal preferences and misalignment.

English

4.3K

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

Draganov et al (2026) demonstrated “phantom transfer” as a data poisoning attack. With a setup similar to ours, they show transfer of traits between different model families. This transfer is difficult to stop — various defenses fail.

English

Abhishek Shetty retweetledi

Owain Evans@OwainEvans_UK·15 Nis

They also provide a theoretical framework based on the approximate log-linearity of LLM representations.x.com/AdenIshaq/stat…

Ishaq Aden-Ali@AdenIshaq

1/7 Excited about our new paper with @axliu42 @GolowichNoah @AShettyV @nhaghtal and Ankur on how data selection can have wild effects!

English

Abhishek Shetty retweetledi

Vaishnavh Nagarajan@_vaishnavh·14 Şub

if the low-rank logits really holds across settings, i expect it should have a lot of downstream corollaries & connections waiting to be discovered

English

333

Abhishek Shetty retweetledi

Vaishnavh Nagarajan@_vaishnavh·14 Şub

i also like the low-rank logits finding (arxiv.org/abs/2510.24966) bc it provides a novel abstraction to think about what function a trained LLM implements. (it actually took me a while to understand and appreciate and buy the exact result)

English

399

Abhishek Shetty retweetledi

Vaishnavh Nagarajan@_vaishnavh·14 Şub

incredibly, you can select these datapoints through a straightforward method: see whether the given preference is aligned with a model prompted with the target behavior. (i'd have expected that you'd need an exponential search over all possible data subsets to accomplish this)

English

351

Keşfet

@CarinaLHong @MarkSellke @SebastienBubeck @WittedNote @GolowichNoah @auddery @SadhikaMalladi @jordan_t_ash