‼️ New paper: Normalization Layers Are All That Sharpness-Aware Minimization Needs ‼️
arxiv.org/abs/2306.04226
We show that applying SAM only to the normalization layers of a network (SAM-ON) enhances performance compared to applying it to the full network.
New paper: We introduce Visual Memory Injection, a new attack on large vision-language models. A subtly perturbed image, that remains in the chat context, causes the model to behave normally for many turns and later triggers a targeted harmful response on a topic-specific prompt.
📣 We are expanding our AI Safety and Alignment group at @ELLISInst_Tue and @MPI_IS! We have:
- a great cluster at MPI with 50+ GB200s, 250+ H100s, and many-many A100 80GBs,
- outstanding colleagues (@jonasgeiping, @sahar_abdelnabi, etc),
- competitive salaries (as for academia),
- fully English-speaking environment.
In particular, I'm looking for:
- one postdoc with a proven track record in AI safety,
- PhD students with a strong computer science background and ideally experience in cybersecurity, interpretability, or training dynamics,
- master’s thesis students (if you are already in Tübingen or can relocate to Tübingen for ~6 months),
- remote mentees for the Summer 2026 MATS cohort (apply directly via the MATS portal).
I'll be at NeurIPS in San Diego and would be glad to chat about these positions!
Happy to share that I've started as an assistant professor at @AaltoUniversity and ELLIS Institute Finland!
I'll recruit students via the ELLIS PhD Program ellis.eu/research/phd-p… to work on multimodal learning, robustness, visual reasoning... feel free to reach out!
Very promising results on *robust* unlearning from colleagues at Tübingen and EPFL.
(+ some general improvements to the standard evaluation by using an LLM judge and worst-case evaluation over paraphrases and input formats)
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨
Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 (i.e., as soon as possible) and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns. Fill the Google form below if you're interested!
Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks. We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks. For more information about research topics relevant to our group, please check the following documents:
- International AI Safety Report,
- An Approach to Technical AGI Safety and Security by DeepMind,
- Open Philanthropy’s 2025 RFP for Technical AI Safety Research.
Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts—literally anything that can be genuinely useful for other researchers and the general public.
Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective". Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us. We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time.
Join us on this journey!
Excited to announce FuseLIP: an embedding model that encodes image+text into a single vector. We achieve this by tokenizing images into discrete tokens, merging these with the text tokens and subsequently processing them with a single transformer.
With @bremen79, We propose a new algorithm for constructing confidence intervals for means of bounded r.vs using "testing by betting" framework. It performs remarkably well even in the challenging, very small sample regime. (and of course, it is great in the large sample one)
This connects several points from our previous work: We found that the Mahalanobis score yields SOTA results for some models on our NINCO benchmark, but fails for others. In a workshop paper, we found that the Mahalanobis score is very sensitive to training hyperparams (esp. lr).
Mahalanobis++: Improving OOD Detection via Feature Normalization
Our latest work has been accepted to ICML and is now also on arXiv!
We explain why Mahalanobis-based OOD detection led to varied results and show that l2 normalization improves its performance consistently.
📢 Robustness is not always at odds with accuracy! We show that adversarially robust vision encoders improve clean and robust accuracy over their base models in perceptual similarity tasks. Looking forward to presenting at SaTML @satml_conf in Copenhagen next week 🇩🇰
📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨
We optimized the architecture with two goals in mind:
- Efficiency (in Training and Inference)
and
- Stability
🧵(1/7)
📢 Check out our new report: we show that a recently proposed defense against adversarial attacks is not robust. We circumvent gradient masking issues of the proposed model by attacking a slightly adapted surrogate model and then transferring the perturbations.