Ido Amos

13 posts

Ido Amos

Ido Amos

@AmosaurusRex

MSc student at Tel-Aviv University working on ML/DL

Katılım Ocak 2022
263 Takip Edilen97 Takipçiler
Ido Amos
Ido Amos@AmosaurusRex·
@JentseHuang Thanks @JentseHuang ! Sounds very interesting, we mostly used Thinkig States to represent reasoning in our work but treating them as an internal memory indeed sounds very natural. I’ll have a look on your experiments
English
0
0
1
41
J Huang
J Huang@JentseHuang·
@AmosaurusRex Very cool work Ido! I am doing similar things: arxiv.org/abs/2505.10571 In this project we design three simple & effective experiments to show that current LLMs lack such internal memory & thinking. I think it’s worth trying Thinking States on our experiments.
English
1
0
2
61
Ido Amos
Ido Amos@AmosaurusRex·
Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: arxiv.org/abs/2602.08332
Ido Amos tweet media
English
5
26
131
12.1K
Ido Amos
Ido Amos@AmosaurusRex·
Thinking States outperforms existing latent reasoning methods on multiple benchmarks and matches Chain-of-Thought performance on multi-hop QA, while leading to faster inference times. Furthermore, Thinking states exhibit superior length generalization in state-tracking tasks, successfully extrapolating to sequences significantly longer than those seen during training. This work was done during an internship at Google Research with an incredible team of collaborators: @clu_avi @megamor2 @amirgloberson @jonherzig @LiorShani286867 @ISzpektor Read the full paper and explore our findings here: arxiv.org/abs/2602.08332
English
0
0
8
383
Ido Amos
Ido Amos@AmosaurusRex·
A major challenge in latent reasoning is finding effective supervision for the reasoning process. Since thinking states are represented in natural language, we can leverage existing Chain-of-Thought data for supervision. Furthermore, as this supervision is available in advance, we use it to teacher-force the thinking states themselves. This circumvents the need for costly recurrent optimization via backpropagation through time (BPTT), enabling fully parallel training and maintaining nearly constant training costs regardless of reasoning depth.
English
1
0
3
447
Ido Amos
Ido Amos@AmosaurusRex·
@lovodkin93 Good luck on your exciting new journey!!
English
1
0
1
31
Aviv Slobodkin @NeurIPS
Aviv Slobodkin @NeurIPS@lovodkin93·
I’m excited to share that I’ve started a full-time position as a Research Scientist at Google! 🚀 I’ve also moved to the Bay Area 🌉, so if you are around please text me and we can meet for coffee! To new beginnings!
English
21
6
291
24.8K
Ido Amos
Ido Amos@AmosaurusRex·
Honestly cannot believe that our work got the BEST PAPER award @iclr_conf !!! This was an amazing experience with my collaborators @JonathanBerant @ankgup2 , looking forward to share with everyone at the conference. Reach out if you want to chat!
Ido Amos@AmosaurusRex

Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. arxiv.org/abs/2310.02980 🧵

English
2
1
36
4.5K
Ido Amos
Ido Amos@AmosaurusRex·
@ibomohsin A really interesting point of view on LLMs and language in general! Can you expend on what you think fractal dimension means for language?
English
1
0
4
341
Ido Amos
Ido Amos@AmosaurusRex·
[4/4] Investigating the effects of data scale, we find self-pretraining is most effective in low-data regimes, underscoring its importance for evaluation across all dataset sizes. We further show that self pretraining is effective across model sizes and when compute is limited.
Ido Amos tweet media
English
0
2
4
511
Ido Amos
Ido Amos@AmosaurusRex·
[3/4] The marked effect of self-pretraining on long-sequence tasks leads us to rethink the necessity of complex designs, with Diagonal Linear RNNs (DLR) as a specific example. Our findings indicate that, when pretrained, simple architectures can be as effective as complex designs
Ido Amos tweet media
English
1
2
6
637
Ido Amos
Ido Amos@AmosaurusRex·
Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. arxiv.org/abs/2310.02980 🧵
English
1
8
39
9.3K