Asher ✈️ ICLR2026

599 posts

Asher ✈️ ICLR2026

Asher ✈️ ICLR2026

@Ashkl111

Nerd, coder, undergrad @brownuniversity | ✡️

Katılım Eylül 2020
239 Takip Edilen82 Takipçiler
Sabitlenmiş Tweet
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵
Asher ✈️ ICLR2026 tweet media
English
1
1
11
784
Keshav Ramji
Keshav Ramji@KeshavRamji·
What if your language model could reason efficiently in an entirely new language? We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.
Keshav Ramji tweet media
English
60
133
1.1K
1.2M
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
My tl is currently half looped transformers and half debates about pressing red/blue this app is hilarious (blue is the correct answer btw)
English
0
0
2
50
Benhao Huang
Benhao Huang@huskydogewoof·
@Ashkl111 Thank you! 😆 It is never too late, and it is glad to have your work in this repo!
Benhao Huang tweet media
English
1
0
3
86
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
@RidgerZhu Very cool write up. In my own work I’ve found that smaller looped models w/ higher LR often mimic many of the problems of larger models w/ lower LR, which makes it easier to avoid unstable architectures before scaling
English
0
0
2
333
Hayden Prairie
Hayden Prairie@hayden_prairie·
@Ashkl111 Yeah, I would definitely love to chat. I will be at ICLR also, so we should definitely meet up! I'd be very interested in learning about your work.
English
1
0
2
184
Hayden Prairie
Hayden Prairie@hayden_prairie·
We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇
Hayden Prairie tweet media
English
41
179
1.3K
293.2K
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
@DimaKrotov The idea of "reasoning in latent space" is what got me working on looped transformers in the first place. Really cool to see the energy framing, I think there's some clean relationships with looped transformers and energy minimization at basins.
English
0
0
9
1.2K
Dmitry Krotov
Dmitry Krotov@DimaKrotov·
This past week several people asked me about how Loop Transformers are related to Energy Transformers. Energy Transformers are: 👉 looped transformers 👉 energy-based models (EBMs) 👉 Dense Associative Memories — generalized Hopfield nets with superior scaling laws for information storage That combo is powerful: 👉 looping = iterative refinement = reasoning in the latent space (not in token space) 👉 energy-based = stability of token dynamics 👉 Associative Memory = strong retrieval capabilities Put together: models that settle into good solutions, not just predict next tokens. I’ve been especially excited about this class of models for a while. They feel like a promising direction for more stable, interpretable, and memory-rich AI systems. This week at #ICLR2026 we are presenting NRGPT arxiv.org/abs/2512.16762, which is a: 👉 a looped transformer 👉 a stable Dense Associative Memory 👉 works great on ListOps and real text Original Energy Transformer paper: arxiv.org/abs/2302.07253 from NeurIPS 2023
Dmitry Krotov tweet media
English
3
49
353
69.1K
Asher ✈️ ICLR2026 retweetledi
Bret Greenstein
Bret Greenstein@bretgreenstein·
Companies love to talk about how long reasoning times 'solve' intelligence. This paper shows that how you use the reasoning loop and create the right iteration architecture matters a lot.
Asher ✈️ ICLR2026@Ashkl111

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English
0
1
1
100
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
@josephdviviano When I started working with looped TFs ~a year ago, I was constantly annoyed at how unpredictably they failed. Ended up writing theory on when this happens -- hopefully it saves future researchers those first few months. x.com/Ashkl111/statu…
Asher ✈️ ICLR2026@Ashkl111

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English
0
0
2
94
Joseph Viviano
Joseph Viviano@josephdviviano·
I was very interested in architectures like this around 2020. They seemed elegant, brainlike, and held keys to avoid the inefficiencies of serial architectures. My colleagues back then told me "they don't work", or "universal transfomers were already done". Ignore pessimists.
Dimitris Papailiopoulos@DimitrisPapail

So fun watching looped transformers taking off this week! Worth mentioning that @AngelikiGiannou & @shashank_r12 coined the term and gave a beautiful looped construction of an assembly-like computer in Jan 2023 arxiv.org/abs/2301.13196

English
5
0
51
6.4K
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
@JFPuget One of the theoretical benefits of looped transformers in particular is their ability to run for **more** loops than in training to solve harder problems. Whether they do in all cases is... complex x.com/Ashkl111/statu…
Asher ✈️ ICLR2026@Ashkl111

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English
0
0
2
147
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
I don't get the hype around looped transformers. Correction, I don't get why the hype now. It is not a new idea. I myself used it in a neurips competition few years ago. It is the justr same as weight sharing across transformer layers. It doesn't fundamental change what a transformer is, it is just more memory efficient.
English
28
9
180
19.7K
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
Full paper: arxiv.org/abs/2604.15259 I’ll be at ICLR in Rio next week presenting a different paper on tabular ML. If you’re working on looped/recurrent models, test-time compute, or tabular ML, I’d love to chat in person.
English
0
0
3
126
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
I find: - Without recall, looped models act like basin selectors rather than smooth input-dependent algorithms - Recall helps preserve input dependence, but models are often still fragile - Outer normalization broadens the parameter regions over which the models are stable
English
1
0
2
99
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵
Asher ✈️ ICLR2026 tweet media
English
1
1
11
784
Asher ✈️ ICLR2026
Asher ✈️ ICLR2026@Ashkl111·
@heyanuja @papertrailshq So funnily enough, I saw that post a few months ago and started making my own version -- never got far since I had other projects, but I looked into research databases and there are really cool existing open-source ones that would just require API calls/downloading, no scraping!
English
3
0
2
62
Anuja U
Anuja U@heyanuja·
I made a Goodreads for academic papers! (..and blog posts, substacks, lesswrong, etc) Paper Trails [papertrailshq.com] is something I built because I wanted a place where engaging with research felt fun, beautiful, and personal to you I hope you give it a try & love it!
Anuja U tweet media
sanje horah@sanjehorah

i am BEGGING

English
165
1.5K
9.6K
953.6K