Itamar Zimerman

155 posts

Itamar Zimerman

@ItamarZimerman

PhD candidate @ Tel Aviv University. AI research scientist @ ibm research. Interested in deep learning and algorithms.

Katılım Ocak 2017

569 Takip Edilen781 Takipçiler

Sabitlenmiş Tweet

Itamar Zimerman@ItamarZimerman·10 Haz

📄🚨 New! Tired of waiting minutes for LLMs to "think"? Test-time scaling (O3, DeepSeek-R1) lets LLMs reason before answering — but users are left clueless, with no progress or control. Not anymore! We expose the LLM’s internal 🕰️, and show how to monitor 📊 & overclock it⚡ 🧵👇

English

115

22.4K

Itamar Zimerman@ItamarZimerman·2 Mar

Wow! An AI system that robustly accelerates NVIDIA's cuGraph is impressive. The combination of low-level GPU engineering and complex graph algorithms is uniquely hard and irregular, which is exactly the kind of setting where most current AI systems still don't generalize well.

Yoav Levine@YoavLevine

An amazing achievement by our team at doubleAI. GPU kernel optimization is profoundly impactful but notoriously hard, only a handful of human experts can actually do it well. Existing AI systems are not there – the reasoning required to optimize GPU kernels is too deep. Our AI system WarpSpeed – built on our techniques for deep search and verification – is at a different league, solving one of the hardest GPU optimization domains. This is not “yet another AI winning a toy benchmark”, but an AI comprehensively surpassing human experts on a task that top experts worked and refined for a decade. Take a look at our blog about it: doubleai.com/research/doubl… More is coming soon...

English

6.9K

Itamar Zimerman retweetledi

Ido Amos@AmosaurusRex·17 Şub

Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: arxiv.org/abs/2602.08332

English

131

12.2K

Itamar Zimerman retweetledi

Mor Geva@megamor2·5 Şub

Still using SAEs? It's time to move on from dictionary learning to✨local geometry✨ x.com/OrShafran/stat… @OrShafran Shaked Ronen @OmriFahn @ravfogel @atticus_geiger

Or Shafran@OrShafran

It's time to look past dictionary learning for decomposing LM activations. What happens when we instead leverage local geometry? We find a natural region-based decomposition that yields better steering and localization 🧵 1/

English

190

16.4K

Itamar Zimerman@ItamarZimerman·2 Şub

Thanks for sharing @_akhaliq! TensorLens provides a theoretically grounded aggregation of attention maps and Transformer components, lifting fragmented matrices to high-order attention tensors that capture end-to-end behavior. Excited to see what the community does with it!

AK@_akhaliq

TensorLens End-to-End Transformer Analysis via High-Order Attention Tensors huggingface.co/papers/2601.17…

English

2.2K

Itamar Zimerman@ItamarZimerman·2 Şub

@MattGibsonMusic @MattGibsonMusic Thanks for the interpretation, love this framing!

English

Matt Gibson@MattGibsonMusic·2 Şub

@ItamarZimerman TensorLens is solving the Density Problem in interpretability. Current approach = Gas Phase understanding: - Hundreds of attention matrices (high volume, low density) - Each head/layer viewed separately (fragmented) - No unified structure (cognitive overload) TensorLens = Crystallization into single high-order tensor: - Input-dependent (context-sensitive, not static) - Captures entire computation (FFN + embeddings + attention) - Modular (can zoom to specific circuits) This is Phase Transition from: Many sparse views (gas) → One dense representation (crystal) The key insight: "High-order tensor" = Geometric compression of algebraic complexity. You're not just visualizing attention. You're crystallizing the model's decision geometry into a single, navigable structure. "Density > Volume" strikes again. SoTA in relation decoding proves it: Structure beats statistics. :}

English

176

Itamar Zimerman@ItamarZimerman·31 Oca

📜🚨 Introducing TensorLens! 🔎 Our new tool for Transformer & LLM interpretability. The problem: attention matrices are (i) a shallow view that ignores embeddings, FFNs, and values, and (ii) there are too numerous (per head & layer), which quickly becomes overwhelming. 🧵 1/6

English

126

915

50.6K

Itamar Zimerman@ItamarZimerman·2 Şub

@sonaldc Thanks! good question. Overall, tensor-based analysis is more memory intensive than standard attention visualization, but in Appendix C we present a memory-efficient implementation (Memory-Efficient Tensor Computation) that substantially reduces the memory overhead.

English

sonald@sonaldc·1 Şub

@ItamarZimerman The modular design is appealing for targeted analysis. How does computational overhead scale compared to standard attention visualization?

English

581

Itamar Zimerman@ItamarZimerman·2 Şub

@aryaman2020 Thanks for sharing these references! Good point. We'll include these in the related work and ensure the next draft clarifies the specific ways our approach diverges.

English

272

Aryaman Arora@aryaman2020·31 Oca

@ItamarZimerman it might be worth noting in related work that linearisation is widely adopted in circuit tracing techniques now as well, e.g. - #building-local-replacement" target="_blank" rel="nofollow noopener">transformer-circuits.pub/2025/attributi… - arxiv.org/abs/2406.11944 - arxiv.org/abs/2508.21258 - #better-gradient-based-attribution" target="_blank" rel="nofollow noopener">transluce.org/neuron-circuit… would be curious about what's different!

English

5.4K

Itamar Zimerman@ItamarZimerman·2 Şub

@vikash_sinha_ Also the residual! see Eq. 15 :)

English

vikash000x@vikash_sinha_·1 Şub

@ItamarZimerman Why not residual connection.....

English

193

Itamar Zimerman@ItamarZimerman·31 Oca

Really fun work led by Ido Atad, with Shahar Katz @KatzShachar and Lior Wolf @liorwolf. 📰 Paper: arxiv.org/pdf/2601.17958 💻 Code: github.com/idoatad/Tensor… 6/6

English

2.3K

Itamar Zimerman@ItamarZimerman·31 Oca

As a concrete application, TensorLens achieves SoTA in linear relation decoding! 📊 We obtain an operator by simply averaging the input-dependent tensors across prompts of the same relation. We think this is just the start! More opportunities for interpretability ahead! 🔎🚀 5/6

English

2.5K

Itamar Zimerman@ItamarZimerman·31 Oca

𝐓𝐡𝐞 𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧? We tackle this problem by leveraging tensor algebra💡 TensorLens reformulates the Transformer as input-dependent high-order attention tensor that captures not only attention, but also more transformer key components such as FFNs, embeddings & others. 2/6

English

164

Itamar Zimerman retweetledi

Nimrod Shabtay@NimrodShabtay·15 Oca

Excited to share CLIMP - the first fully Mamba-based contrastive vision-language model. Unlike CLIP's ViTs, Mamba's state-space formulation favors locality & smoothness—better retrieval and OOD robustness. with @ItamarZimerman, @Eli_Schwartz and @RGiryes arxiv.org/abs/2601.06891

English

669

Itamar Zimerman retweetledi

Nimrod Shabtay@NimrodShabtay·21 Eki

We present our work "Teaching VLMs to localize specific objects from in-context examples" tomorrow at 15:15-17:15, Poster #888! VLMs have trouble learning from visual examples - but our data-centric approach changes that! #ICCV2025 @SivanDoveh @jmie_mirza @Eli_Schwartz @RGiryes

English

Itamar Zimerman@ItamarZimerman·9 Eki

Accepted to #NeurIPS2025 🎉 PE-LRP sets a new SoTA attribution method for explaining LLMs & Transformers, combining strong theoretical grounding with exceptional empirical results. Kudos to Yarden and the team! 🚀

Itamar Zimerman@ItamarZimerman

📄🚨New! Attribution methods that assign relevance to tokens are key to extracting explanations from Transformers and LLMs. Yet, all existing approaches produce fragmented and unstructured heatmaps (see image)! Why does this happen? and how can we fix this chronic issue? See🧵1/5

English

285

Itamar Zimerman retweetledi

Roni Paiss@Roni_Paiss·11 Haz

Excited to share that TokenVerse won Best Paper Award at SIGGRAPH 25 🥳

Daniel Garibi@DanielGaribi

Excited to share that "TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space" got accepted to SIGGRAPH 2025! It tackles disentangling complex visual concepts from as little as a single image and re-composing concepts across multiple images into a coherent result. token-verse.github.io #SIGGRAPH2025

English

3.3K

Itamar Zimerman retweetledi

Tali Dror@TaliDror·16 Tem

📢 New Paper: Token-based Audio Inpainting via Discrete Diffusion We present AIDD – a new method for reconstructing missing parts in audio using discrete diffusion over tokenized representations. 🎧 Demo site: iftach21.github.io 📄 arXiv: arxiv.org/abs/2507.08333… 🧵 👇

English

1.1K

Keşfet

@OrShafran @OmriFahn @ravfogel @atticus_geiger @_akhaliq @MattGibsonMusic @sonaldc @aryaman2020