Asher ✈️ ICLR2026 (@Ashkl111) - Twitter Profili

Sabitlenmiş Tweet

Asher ✈️ ICLR2026@Ashkl111·20 Nis

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English

1

11

784

Asher ✈️ ICLR2026@Ashkl111·27 Nis

@KeshavRamji Are you presenting this at the iclr latent reasoning workshop? would love to learn more if so!

English

0

539

Keshav Ramji@KeshavRamji·27 Nis

What if your language model could reason efficiently in an entirely new language? We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.

English

60

133

1.1K

1.2M

Asher ✈️ ICLR2026@Ashkl111·25 Nis

My tl is currently half looped transformers and half debates about pressing red/blue this app is hilarious (blue is the correct answer btw)

English

0

2

50

Asher ✈️ ICLR2026@Ashkl111·24 Nis

@huskydogewoof Just did!

English

1

0

2

33

Benhao Huang@huskydogewoof·24 Nis

@Ashkl111 Would really appreciate it if you could also give it a star and help make it more visible to others! github.com/huskydoge/Awes… Thank you!

English

1

0

1

19

Asher ✈️ ICLR2026@Ashkl111·24 Nis

This is an absolutely amazing repo. My only problem is that it didn’t exist when I began my work :)

Benhao Huang@huskydogewoof

Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models! Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions. 🧵 [1/n]

English

1

2

5

650

Asher ✈️ ICLR2026@Ashkl111·24 Nis

@huskydogewoof Glad to have it in as well!!

English

1

0

3

37

Benhao Huang@huskydogewoof·24 Nis

@Ashkl111 Thank you! 😆 It is never too late, and it is glad to have your work in this repo!

English

1

0

3

86

Asher ✈️ ICLR2026@Ashkl111·22 Nis

@RidgerZhu Very cool write up. In my own work I’ve found that smaller looped models w/ higher LR often mimic many of the problems of larger models w/ lower LR, which makes it easier to avoid unstable architectures before scaling

English

0

2

333

Rui-Jie Zhu@RidgerZhu·22 Nis

x.com/i/article/2046…

ZXX

5

49

280

68.6K

Asher ✈️ ICLR2026@Ashkl111·20 Nis

@hayden_prairie Awesome! Shoot me an email at asher_labovich@brown.edu and we can figure out a time that works -- I'm there the whole week

English

0

1

24

Hayden Prairie@hayden_prairie·20 Nis

@Ashkl111 Yeah, I would definitely love to chat. I will be at ICLR also, so we should definitely meet up! I'd be very interested in learning about your work.

English

1

0

2

185

Hayden Prairie@hayden_prairie·15 Nis

We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇

English

41

179

1.3K

293.2K

Asher ✈️ ICLR2026@Ashkl111·20 Nis

@DimaKrotov The idea of "reasoning in latent space" is what got me working on looped transformers in the first place. Really cool to see the energy framing, I think there's some clean relationships with looped transformers and energy minimization at basins.

English

0

9

1.2K

Dmitry Krotov@DimaKrotov·20 Nis

This past week several people asked me about how Loop Transformers are related to Energy Transformers. Energy Transformers are: 👉 looped transformers 👉 energy-based models (EBMs) 👉 Dense Associative Memories — generalized Hopfield nets with superior scaling laws for information storage That combo is powerful: 👉 looping = iterative refinement = reasoning in the latent space (not in token space) 👉 energy-based = stability of token dynamics 👉 Associative Memory = strong retrieval capabilities Put together: models that settle into good solutions, not just predict next tokens. I’ve been especially excited about this class of models for a while. They feel like a promising direction for more stable, interpretable, and memory-rich AI systems. This week at #ICLR2026 we are presenting NRGPT arxiv.org/abs/2512.16762, which is a: 👉 a looped transformer 👉 a stable Dense Associative Memory 👉 works great on ListOps and real text Original Energy Transformer paper: arxiv.org/abs/2302.07253 from NeurIPS 2023

English

3

49

353

69.1K

Asher ✈️ ICLR2026 retweetledi

Bret Greenstein@bretgreenstein·20 Nis

Companies love to talk about how long reasoning times 'solve' intelligence. This paper shows that how you use the reasoning loop and create the right iteration architecture matters a lot.

Asher ✈️ ICLR2026@Ashkl111

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English

0

1

100

Asher ✈️ ICLR2026@Ashkl111·20 Nis

@josephdviviano When I started working with looped TFs ~a year ago, I was constantly annoyed at how unpredictably they failed. Ended up writing theory on when this happens -- hopefully it saves future researchers those first few months. x.com/Ashkl111/statu…

Asher ✈️ ICLR2026@Ashkl111

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English

0

2

94

Joseph Viviano@josephdviviano·19 Nis

I was very interested in architectures like this around 2020. They seemed elegant, brainlike, and held keys to avoid the inefficiencies of serial architectures. My colleagues back then told me "they don't work", or "universal transfomers were already done". Ignore pessimists.

Dimitris Papailiopoulos@DimitrisPapail

So fun watching looped transformers taking off this week! Worth mentioning that @AngelikiGiannou & @shashank_r12 coined the term and gave a beautiful looped construction of an assembly-like computer in Jan 2023 arxiv.org/abs/2301.13196

English

5

0

51

6.4K

Asher ✈️ ICLR2026@Ashkl111·20 Nis

@JFPuget One of the theoretical benefits of looped transformers in particular is their ability to run for **more** loops than in training to solve harder problems. Whether they do in all cases is... complex x.com/Ashkl111/statu…

Asher ✈️ ICLR2026@Ashkl111

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English

0

2

147

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·20 Nis

I don't get the hype around looped transformers. Correction, I don't get why the hype now. It is not a new idea. I myself used it in a neurips competition few years ago. It is the justr same as weight sharing across transformer layers. It doesn't fundamental change what a transformer is, it is just more memory efficient.

English

28

9

180

19.7K

Asher ✈️ ICLR2026@Ashkl111·20 Nis

Full paper: arxiv.org/abs/2604.15259 I’ll be at ICLR in Rio next week presenting a different paper on tabular ML. If you’re working on looped/recurrent models, test-time compute, or tabular ML, I’d love to chat in person.

English

0

3

126

Asher ✈️ ICLR2026@Ashkl111·20 Nis

I find: - Without recall, looped models act like basin selectors rather than smooth input-dependent algorithms - Recall helps preserve input dependence, but models are often still fragile - Outer normalization broadens the parameter regions over which the models are stable

English

1

0

2

99

Asher ✈️ ICLR2026@Ashkl111·20 Nis

New preprint: "Stability and Generalization in Looped Transformers" Looped transformers are having a moment. Part of their appeal is the theoretical possibility of generalizing to harder problems simply by running more loops. But in practice, that often fails. 🧵

English

1

11

784

Asher ✈️ ICLR2026@Ashkl111·8 Ara

@heyanuja @papertrailshq So, so glad somebody else continued the work when I didn't :) I hope this gets huge!

English

0

17

Asher ✈️ ICLR2026@Ashkl111·8 Ara

@heyanuja @papertrailshq So funnily enough, I saw that post a few months ago and started making my own version -- never got far since I had other projects, but I looked into research databases and there are really cool existing open-source ones that would just require API calls/downloading, no scraping!

English

3

0

2

62

Anuja U@heyanuja·7 Ara

I made a Goodreads for academic papers! (..and blog posts, substacks, lesswrong, etc) Paper Trails [papertrailshq.com] is something I built because I wanted a place where engaging with research felt fun, beautiful, and personal to you I hope you give it a try & love it!

sanje horah@sanjehorah

i am BEGGING

English

165

1.5K

9.6K

953.6K

Asher ✈️ ICLR2026

Keşfet