Itamar Zimerman

155 posts

Itamar Zimerman

Itamar Zimerman

@ItamarZimerman

PhD candidate @ Tel Aviv University. AI research scientist @ ibm research. Interested in deep learning and algorithms.

Katılım Ocak 2017
569 Takip Edilen781 Takipçiler
Sabitlenmiş Tweet
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
📄🚨 New! Tired of waiting minutes for LLMs to "think"? Test-time scaling (O3, DeepSeek-R1) lets LLMs reason before answering — but users are left clueless, with no progress or control. Not anymore! We expose the LLM’s internal 🕰️, and show how to monitor 📊 & overclock it⚡ 🧵👇
Itamar Zimerman tweet media
English
6
21
115
22.4K
Itamar Zimerman retweetledi
Ido Amos
Ido Amos@AmosaurusRex·
Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: arxiv.org/abs/2602.08332
Ido Amos tweet media
English
5
27
131
12.2K
Matt Gibson
Matt Gibson@MattGibsonMusic·
@ItamarZimerman TensorLens is solving the Density Problem in interpretability. Current approach = Gas Phase understanding: - Hundreds of attention matrices (high volume, low density) - Each head/layer viewed separately (fragmented) - No unified structure (cognitive overload) TensorLens = Crystallization into single high-order tensor: - Input-dependent (context-sensitive, not static) - Captures entire computation (FFN + embeddings + attention) - Modular (can zoom to specific circuits) This is Phase Transition from: Many sparse views (gas) → One dense representation (crystal) The key insight: "High-order tensor" = Geometric compression of algebraic complexity. You're not just visualizing attention. You're crystallizing the model's decision geometry into a single, navigable structure. "Density > Volume" strikes again. SoTA in relation decoding proves it: Structure beats statistics. :}
English
1
0
1
176
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
📜🚨 Introducing TensorLens! 🔎 Our new tool for Transformer & LLM interpretability. The problem: attention matrices are (i) a shallow view that ignores embeddings, FFNs, and values, and (ii) there are too numerous (per head & layer), which quickly becomes overwhelming. 🧵 1/6
Itamar Zimerman tweet media
English
14
126
915
50.6K
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
@sonaldc Thanks! good question. Overall, tensor-based analysis is more memory intensive than standard attention visualization, but in Appendix C we present a memory-efficient implementation (Memory-Efficient Tensor Computation) that substantially reduces the memory overhead.
English
0
0
0
72
sonald
sonald@sonaldc·
@ItamarZimerman The modular design is appealing for targeted analysis. How does computational overhead scale compared to standard attention visualization?
English
1
0
1
581
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
@aryaman2020 Thanks for sharing these references! Good point. We'll include these in the related work and ensure the next draft clarifies the specific ways our approach diverges.
English
0
0
2
272
Aryaman Arora
Aryaman Arora@aryaman2020·
@ItamarZimerman it might be worth noting in related work that linearisation is widely adopted in circuit tracing techniques now as well, e.g. - #building-local-replacement" target="_blank" rel="nofollow noopener">transformer-circuits.pub/2025/attributi… - arxiv.org/abs/2406.11944 - arxiv.org/abs/2508.21258 - #better-gradient-based-attribution" target="_blank" rel="nofollow noopener">transluce.org/neuron-circuit… would be curious about what's different!
English
1
1
35
5.4K
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
As a concrete application, TensorLens achieves SoTA in linear relation decoding! 📊 We obtain an operator by simply averaging the input-dependent tensors across prompts of the same relation. We think this is just the start! More opportunities for interpretability ahead!  🔎🚀 5/6
Itamar Zimerman tweet media
English
1
2
21
2.5K
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
𝐓𝐡𝐞 𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧? We tackle this problem by leveraging tensor algebra💡 TensorLens reformulates the Transformer as input-dependent high-order attention tensor that captures not only attention, but also more transformer key components such as FFNs, embeddings & others. 2/6
Itamar Zimerman tweet media
English
0
0
2
164
Itamar Zimerman retweetledi
Nimrod Shabtay
Nimrod Shabtay@NimrodShabtay·
We present our work "Teaching VLMs to localize specific objects from in-context examples" tomorrow at 15:15-17:15, Poster #888! VLMs have trouble learning from visual examples - but our data-centric approach changes that! #ICCV2025 @SivanDoveh @jmie_mirza @Eli_Schwartz @RGiryes
English
0
7
9
1K
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
Accepted to #NeurIPS2025 🎉 PE-LRP sets a new SoTA attribution method for explaining LLMs & Transformers, combining strong theoretical grounding with exceptional empirical results. Kudos to Yarden and the team! 🚀
Itamar Zimerman@ItamarZimerman

📄🚨New! Attribution methods that assign relevance to tokens are key to extracting explanations from Transformers and LLMs. Yet, all existing approaches produce fragmented and unstructured heatmaps (see image)! Why does this happen? and how can we fix this chronic issue? See🧵1/5

English
1
1
5
285
Itamar Zimerman retweetledi
Itamar Zimerman retweetledi
Tali Dror
Tali Dror@TaliDror·
📢 New Paper: Token-based Audio Inpainting via Discrete Diffusion We present AIDD – a new method for reconstructing missing parts in audio using discrete diffusion over tokenized representations. 🎧 Demo site: iftach21.github.io 📄 arXiv: arxiv.org/abs/2507.08333… 🧵 👇
English
1
6
14
1.1K