Satwik Bhattamishra

250 posts

Satwik Bhattamishra banner
Satwik Bhattamishra

Satwik Bhattamishra

@satwik1729

CS PhD student at Oxford | Worked at Google, Cohere, and Microsoft Research

Oxford, England Katılım Aralık 2019
808 Takip Edilen839 Takipçiler
Sabitlenmiş Tweet
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
Given black-box access to a Transformer's output, can we efficiently recover its parameters? We analyse the learnability of attention-based models with query access in our new work. Accepted at #ICML2026 🎉 Work done with @shahkulin98, @mhahn29 and Varun Kanade. 🧵
Satwik Bhattamishra tweet media
English
6
23
137
13.3K
AiDevCraft
AiDevCraft@AiDevCraft·
The multi-head non-identifiability result is doing double duty here — it's a learnability obstacle in the paper, but it's also the natural model-extraction defense commercial APIs implicitly rely on. Does the O(rd) compressed-sensing speedup survive the noisy-oracle regime when r is small?
English
1
0
1
111
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
Given black-box access to a Transformer's output, can we efficiently recover its parameters? We analyse the learnability of attention-based models with query access in our new work. Accepted at #ICML2026 🎉 Work done with @shahkulin98, @mhahn29 and Varun Kanade. 🧵
Satwik Bhattamishra tweet media
English
6
23
137
13.3K
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
@_Suresh2 @shahkulin98 @mhahn29 Query access typically means just input-output pairs and not any other intermediate representations. The difference from traditional setting being that the learner can decide which inputs it wants the labels for rather than getting random labelled examples
English
0
0
1
51
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
@E_FutureFan @shahkulin98 @mhahn29 Apart from that, practical APIs are for language models whereas we consider regressors and classifiers to begin with. Right now, the results are of theoretical interest and hopefully serve as stepping stones for more practically relevant algorithms.
English
0
0
0
13
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
@E_FutureFan @shahkulin98 @mhahn29 Hey, thanks for the question. While security was one of the motivations, our current results do not have any immediate consequences for practical models since our results are for single head attention and one layer models whereas practical models are multilayer multihead models.
English
1
0
0
36
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
We believe there are several open directions around this problem, including multi-head attention, identifiability, and other formulations of query learning. Check out the paper for more details: arxiv.org/abs/2601.16873
English
0
0
7
268
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
Lastly, the multi-head problem appears more difficult. Multi-head attention is not identifiable in the same sense as single-head attention, and query learning it would require additional structural assumptions. We discuss some possible proof directions in the work.
English
1
0
3
270
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
@DamienTeney @mhahn29 The experiments in the paper explore that, though only for models generating small regular languages. For more involved or realistic tasks, one would need a more efficient algorithm.
English
1
0
0
22
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
@DamienTeney @mhahn29 For example, one could use algorithms for this kind of problem to check whether a language model can generate an undesirable string or pattern with non-negligible probability, such as a password, secret key, offensive word, etc.
English
1
0
0
21
Satwik Bhattamishra
Satwik Bhattamishra@satwik1729·
Given access to a language model, can we extract an interpretable object like a DFA that captures which strings a language model is likely to generate? Our new work on automata learning theory studies this question. To be presented at ##ICLR2026 🎉
Satwik Bhattamishra tweet media
English
1
16
86
10.8K
Satwik Bhattamishra retweetledi
Yash Sarrof
Yash Sarrof@yashYRS·
In principle, CoT makes Transformers Turing Complete, but empirically LLMs struggle at longer lengths. In our paper, we study Transformer+CoT length generalization and prove that with a finite vocab, models can't solve problems beyond the restricted class TC0. But there’s a fix🧵
Yash Sarrof tweet media
English
3
16
90
14.5K
Satwik Bhattamishra retweetledi
Charlie London
Charlie London@CharlieLondon02·
We've just released a new benchmark that aims to test models' underlying long-horizon reasoning capabilities. This is very hard to do directly, as it is expensive, time-consuming, and can have many confounding factors.
Sumeet Motwani@sumeetrm

We’re releasing LongCoT, an incredibly hard benchmark to measure long-horizon reasoning capabilities over tens to hundreds of thousands of tokens. LongCoT consists of 2.5K questions across chemistry, math, chess, logic, and computer science. Frontier models score less than 10%🧵

English
2
2
13
2.2K
Satwik Bhattamishra retweetledi
Michael Rizvi-Martel
Michael Rizvi-Martel@frisbeemortel·
Latent CoT is an alternative LLM reasoning scheme hypothesized to enable “superposition” allowing models to hold uncertainty over multiple concepts during reasoning 💭 We revisit superposition in 3 latent CoT approaches and find that it is largely an illusion 🔮! More in 🧵
Michael Rizvi-Martel tweet media
English
9
33
168
14K
Satwik Bhattamishra retweetledi
Yash Sarrof
Yash Sarrof@yashYRS·
Most work on Transformer length generalization assumes a fixed vocabulary. But in real tasks, longer inputs may have new symbols (e.g. more objects in planning). Our new paper introduces C-RASP* to study this and explains the inconsistent performance of Transformers in planning.
Yash Sarrof tweet media
English
1
12
54
9.5K