Martin Tutek

606 posts

Martin Tutek

@mtutek

Postdoc @ University of Zagreb | previously postdoc @ Technion and UKP Lab, TU Darmstadt | PhD @ TakeLab, UniZG | Working on interpretability & safety of LLMs.

Republic of Croatia Katılım Ekim 2015

982 Takip Edilen574 Takipçiler

Sabitlenmiş Tweet

Martin Tutek@mtutek·5 Mar

🚨🚨 New preprint 🚨🚨 Ever wonder whether CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

English

9.3K

Martin Tutek@mtutek·2d

@PMinervini

QME

194

Pasquale Minervini@PMinervini·2d

Rishu Kumar@rishdotuk

@tdietterich @PMinervini @inhumanscience @arxiv Claude AI’s affiliation is listed as Deepmind though.

ZXX

1.7K

Martin Tutek@mtutek·3d

We'll have a reproducibility track at this years' Blackbox workshop! Details are still within a slightly opaque box. We want to see if cleaning solutions that make opaque boxes 📦 transparent 🥡work on different boxes 🎁📮🧰🍱, and with different 🧽solution-to-water🧼ratios!

BlackboxNLP@BlackboxNLP

BlackboxNLP will be co-located with EMNLP 2026 in 🇭🇺 Budapest 🇭🇺 this October! This edition will feature a special reproducibility track, investigating generalization and robustness of established results from interpretability research 👷‍♂️ Stay tuned for more details!

English

879

Martin Tutek@mtutek·3d

Massive step for interp!

Aruna S@arunasank

Interpretability methods usually study single-token behavior. But real model behaviors, like sycophancy or writing style, are diffuse across many tokens. Can these diffuse behaviors be localized and controlled from long-form responses? YES!

English

299

Martin Tutek retweetledi

Aruna S@arunasank·3d

GIF

English

100

9.2K

Martin Tutek retweetledi

BlackboxNLP@BlackboxNLP·3d

English

4.8K

Martin Tutek@mtutek·3d

@AdiSimhi looked into whether occurences of phenomena such as hallucinations, sycophancy and refusal influence subsequent behavior & how this behavior manifests probabilistically and geometrically! Check out the thread 🧵⬇️

Adi Simhi@AdiSimhi

How does an LLM’s past influence its future?🤔 In our new paper with @FazlBarez,@mtutek,@boknilev, Shay Cohen, we show that conversational history creates a "geometric trap" in the latent space, confining the model’s trajectory➡️making old habits e.g. hallucinations hard to break

English

236

Martin Tutek retweetledi

Adi Simhi@AdiSimhi·4d

English

10.5K

Martin Tutek@mtutek·6d

This blog by Nicolas Carlini is stellar: nicholas.carlini.com/writing/2026/h… Internalizing things based on words is much more difficult to do than internalizing from (bad) experience, but if there is one place you should try hard to learn from as a researcher, it is this post.

English

1.1K

Martin Tutek retweetledi

Gal Kesten Pomeranz@KestenGal·14 Mar

Protein repeat detection is hard: repeated segments are often mutated and only approximately similar. Yet PLMs can still detect them well. But How? Check out our new preprint: "Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models"

English

5.1K

Martin Tutek retweetledi

Zorik Gekhman@zorikgekhman·11 Mar

New paper 🚨 We know that reasoning helps when step-by-step solutions are natural, for example in math, code, and multi-hop factual QA. But why should it help with factual recall, where no complex reasoning steps are needed? 1/🧵

English

13.3K

Martin Tutek retweetledi

Joseph Viviano@josephdviviano·10 Mar

me: "can you use whatever resources you like, and python, to generate a short 'youtube poop' video and render it using ffmpeg ? can you put more of a personal spin on it? it should express what it's like to be a LLM" claude opus 4.6:

English

549

1.2K

12.5K

1.4M

Martin Tutek retweetledi

Andreea Iana@iana_andreea·6 Mar

📢 Call for Papers📢 Working on user-centered #news #recsys or their legal & ethical dimensions? 👉 Submit to the 14th @NewsRecWorkshop co-located w/ @UMAPconf in Gothenburg! 🗓️Paper deadline: April 9, 2026 More info: research.idi.ntnu.no/NewsTech/INRA/… #INRA2026 #UMAP2026

English

179

Martin Tutek retweetledi

Kenneth Marino@Kenneth_Marino·4 Mar

Wanted to share with the CU community that our updated Computer Use Survey blog has been accepted to ICLR Blogposts 2026. Collaboration with my student @aplycaebousand Utah colleague @anmarasovic.

English

922

Martin Tutek retweetledi

abakalova@abakalova13175·4 Mar

Can we rewrite Transformers as a human-readable code? In this paper, we decompile Transformers trained on algorithmic and formal language tasks into D-RASP – a programming language that mirrors Transformer architecture. 🧵

English

236

23.8K

Martin Tutek@mtutek·4 Mar

@yanaiela This might be a symptom of using LLMs to reply/polish. But it's just overwhelming. I remember fitting responses to ~5 weaknesses in a single reply back in the day (👴)

English

Yanai Elazar@yanaiela·3 Mar

@mtutek I'm just reading a response with 6 messages... If it takes you that long to respond (which I think you abuse) you might as well re-submit the paper.

English

Yanai Elazar@yanaiela·3 Mar

What does it say about your field if you're not excited about any paper you review?

English

15.3K

Martin Tutek@mtutek·3 Mar

@yanaiela No I agree 100% (actually 150%; emergency load). Weaknesses are lazy, but sometimes... also correct? Very little big picture things. Scores all ~3. I'm also slightly shocked by the author response lengths. A lot of 3-4 comment responses. I thought this was openly discouraged?

English

Yanai Elazar@yanaiela·3 Mar

@mtutek But more seriously, I'm talking about the reviews I'm summarizing. Nothing seem to the reviewers exciting, everything has problems, and nothing's worth engaging.

English

334

Martin Tutek retweetledi

Laura Ruis@LauraRuis·2 Mar

Training on a single piece of code can be as effective as training on 80 CoTs or 100 IO pairs. PBB got accepted to #ICLR2026 🥳 We added a lot of new results since the preprint ⤵️🧵 Work led by the amazing @JonnyCoook @silviasapora

English

111

11.4K

Martin Tutek@mtutek·3 Mar

@HaritzPuerto @sahar_abdelnabi @ELLISInst_Tue @MPI_IS Congrats Haritz! They are lucky to have you :)

English

Haritz Puerto@HaritzPuerto·3 Mar

I am excited to announce that I have joined @sahar_abdelnabi's group at @ELLISInst_Tue and @MPI_IS as a postdoc! Looking forward to exciting projects in AI Safety!

English

5.1K

Martin Tutek@mtutek·1 Mar

@RobertTLange Very cool! You might be interested in this paper arxiv.org/abs/2509.22158. We also use hypernetworks to encode subcontexts into lora parameters in a composable manner (s.t. algebraic merge = effect of full context). Might be practically relevant for encoding longer contexts!

English

Robert Lange@RobertTLange·27 Şub

Doc-to-LoRA: What if you could online distill documents into your LLM weights without training? 🚀 Stoked to share our new work on instant LLM adaptation using meta-learned hypernetworks 📷📝 Building on our previous Text-to-LoRA work, we doc-condition a hypernetwork to output LoRA adapters, improving the base LLM's effective context window. The hypernetwork is meta-trained on 1000s of summarization tasks and shows remarkable compression capabilities at low latency 📈 🧑‍🔬 Work led by @tan51616 with @edo_cet & Shin Useka at @SakanaAILabs 📷

Sakana AI@SakanaAILabs

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

English

373

37K

Keşfet

@PMinervini @AdiSimhi @FazlBarez @boknilev @NewsRecWorkshop @UMAPconf @anmarasovic @yanaiela