Martin Tutek

606 posts

Martin Tutek

Martin Tutek

@mtutek

Postdoc @ University of Zagreb | previously postdoc @ Technion and UKP Lab, TU Darmstadt | PhD @ TakeLab, UniZG | Working on interpretability & safety of LLMs.

Republic of Croatia Katılım Ekim 2015
982 Takip Edilen574 Takipçiler
Sabitlenmiş Tweet
Martin Tutek
Martin Tutek@mtutek·
🚨🚨 New preprint 🚨🚨 Ever wonder whether CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829
English
2
10
57
9.3K
Martin Tutek
Martin Tutek@mtutek·
We'll have a reproducibility track at this years' Blackbox workshop! Details are still within a slightly opaque box. We want to see if cleaning solutions that make opaque boxes 📦 transparent 🥡work on different boxes 🎁📮🧰🍱, and with different 🧽solution-to-water🧼ratios!
BlackboxNLP@BlackboxNLP

BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October! This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️ Stay tuned for more details!

English
0
4
22
879
Martin Tutek retweetledi
Aruna S
Aruna S@arunasank·
Interpretability methods usually study single-token behavior. But real model behaviors, like sycophancy or writing style, are diffuse across many tokens. Can these diffuse behaviors be localized and controlled from long-form responses? YES!
GIF
English
3
20
100
9.2K
Martin Tutek retweetledi
BlackboxNLP
BlackboxNLP@BlackboxNLP·
BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October! This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️ Stay tuned for more details!
BlackboxNLP tweet media
English
1
13
53
4.8K
Martin Tutek
Martin Tutek@mtutek·
@AdiSimhi looked into whether occurences of phenomena such as hallucinations, sycophancy and refusal influence subsequent behavior & how this behavior manifests probabilistically and geometrically! Check out the thread 🧵⬇️
Adi Simhi@AdiSimhi

How does an LLM’s past influence its future?🤔 In our new paper with @FazlBarez,@mtutek,@boknilev, Shay Cohen, we show that conversational history creates a "geometric trap" in the latent space, confining the model’s trajectory➡️making old habits e.g. hallucinations hard to break

English
0
3
12
236
Martin Tutek retweetledi
Adi Simhi
Adi Simhi@AdiSimhi·
How does an LLM’s past influence its future?🤔 In our new paper with @FazlBarez,@mtutek,@boknilev, Shay Cohen, we show that conversational history creates a "geometric trap" in the latent space, confining the model’s trajectory➡️making old habits e.g. hallucinations hard to break
Adi Simhi tweet media
English
1
15
97
10.5K
Martin Tutek
Martin Tutek@mtutek·
This blog by Nicolas Carlini is stellar: nicholas.carlini.com/writing/2026/h… Internalizing things based on words is much more difficult to do than internalizing from (bad) experience, but if there is one place you should try hard to learn from as a researcher, it is this post.
English
1
3
21
1.1K
Martin Tutek retweetledi
Gal Kesten Pomeranz
Gal Kesten Pomeranz@KestenGal·
Protein repeat detection is hard: repeated segments are often mutated and only approximately similar. Yet PLMs can still detect them well. But How? Check out our new preprint: "Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models"
Gal Kesten Pomeranz tweet media
English
1
16
44
5.1K
Martin Tutek retweetledi
Zorik Gekhman
Zorik Gekhman@zorikgekhman·
New paper 🚨 We know that reasoning helps when step-by-step solutions are natural, for example in math, code, and multi-hop factual QA. But why should it help with factual recall, where no complex reasoning steps are needed? 1/🧵
Zorik Gekhman tweet media
English
3
16
90
13.3K
Martin Tutek retweetledi
Joseph Viviano
Joseph Viviano@josephdviviano·
me: "can you use whatever resources you like, and python, to generate a short 'youtube poop' video and render it using ffmpeg ? can you put more of a personal spin on it? it should express what it's like to be a LLM" claude opus 4.6:
English
549
1.2K
12.5K
1.4M
Martin Tutek retweetledi
Kenneth Marino
Kenneth Marino@Kenneth_Marino·
Wanted to share with the CU community that our updated Computer Use Survey blog has been accepted to ICLR Blogposts 2026. Collaboration with my student @aplycaebousand Utah colleague @anmarasovic.
Kenneth Marino tweet media
English
1
2
4
922
Martin Tutek retweetledi
abakalova
abakalova@abakalova13175·
Can we rewrite Transformers as a human-readable code? In this paper, we decompile Transformers trained on algorithmic and formal language tasks into D-RASP – a programming language that mirrors Transformer architecture. 🧵
abakalova tweet media
English
2
38
236
23.8K
Martin Tutek
Martin Tutek@mtutek·
@yanaiela This might be a symptom of using LLMs to reply/polish. But it's just overwhelming. I remember fitting responses to ~5 weaknesses in a single reply back in the day (👴)
English
1
0
1
42
Yanai Elazar
Yanai Elazar@yanaiela·
@mtutek I'm just reading a response with 6 messages... If it takes you that long to respond (which I think you abuse) you might as well re-submit the paper.
English
1
0
1
67
Yanai Elazar
Yanai Elazar@yanaiela·
What does it say about your field if you're not excited about any paper you review?
English
13
1
82
15.3K
Martin Tutek
Martin Tutek@mtutek·
@yanaiela No I agree 100% (actually 150%; emergency load). Weaknesses are lazy, but sometimes... also correct? Very little big picture things. Scores all ~3. I'm also slightly shocked by the author response lengths. A lot of 3-4 comment responses. I thought this was openly discouraged?
English
1
0
1
95
Yanai Elazar
Yanai Elazar@yanaiela·
@mtutek But more seriously, I'm talking about the reviews I'm summarizing. Nothing seem to the reviewers exciting, everything has problems, and nothing's worth engaging.
English
1
0
1
334
Martin Tutek retweetledi
Laura Ruis
Laura Ruis@LauraRuis·
Training on a single piece of code can be as effective as training on 80 CoTs or 100 IO pairs. PBB got accepted to #ICLR2026 🥳 We added a lot of new results since the preprint ⤵️🧵 Work led by the amazing @JonnyCoook @silviasapora
Laura Ruis tweet media
English
4
20
111
11.4K
Martin Tutek
Martin Tutek@mtutek·
@RobertTLange Very cool! You might be interested in this paper arxiv.org/abs/2509.22158. We also use hypernetworks to encode subcontexts into lora parameters in a composable manner (s.t. algebraic merge = effect of full context). Might be practically relevant for encoding longer contexts!
English
0
0
0
27
Robert Lange
Robert Lange@RobertTLange·
Doc-to-LoRA: What if you could online distill documents into your LLM weights without training? 🚀 Stoked to share our new work on instant LLM adaptation using meta-learned hypernetworks 📷📝 Building on our previous Text-to-LoRA work, we doc-condition a hypernetwork to output LoRA adapters, improving the base LLM's effective context window. The hypernetwork is meta-trained on 1000s of summarization tasks and shows remarkable compression capabilities at low latency 📈 🧑‍🔬 Work led by @tan51616 with @edo_cet & Shin Useka at @SakanaAILabs 📷
Sakana AI@SakanaAILabs

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

English
15
32
373
37K