Berk Ustun

764 posts

Berk Ustun

@berkustun

Assistant Prof @UCSD. I work on safety, interpretability, and personalization in ML. Previously @GoogleAI @Harvard @MIT @UCBerkeley🇨🇭🇹🇷

San Diego, CA Katılım Mart 2009

985 Takip Edilen2.8K Takipçiler

Sabitlenmiş Tweet

Berk Ustun@berkustun·18 Eyl

Denied a loan by an ML model? You should be able to change something to get approved! In a new paper w @AlexanderSpangh & @yxxxliu, we call this concept "recourse" & we develop tools to measure it for linear classifiers. PDF bit.ly/2xh0idm CODE bit.ly/2xiEvls

English

184

Berk Ustun retweetledi

Yann LeCun@ylecun·25 Oca

Murderers

Ryan Grim@ryangrim

Drop Site obtained harrowing footage of the latest killing which appears to be from the perspective of the woman in pink filming from the sidewalk

English

227

582

8.1K

700.3K

Berk Ustun retweetledi

Alex Spangher @ Neurips2025@AlexanderSpangh·9 Ara

Neurips 2025 was such a blast! We snuck a grand piano into the CreativeAI Track to demo Aria, our pretrained chat-style music model:

English

400

29.9K

Berk Ustun retweetledi

Jessica Hullman@JessicaHullman·4 Ara

A 'pragmatic intepretability' turn sounds a lot like our argument/framework for evaluating explanation methods--Time to replace task-agnostic fortunetelling w/concrete decision problem specs + theoretic & empirical evidence of expected performance boost arxiv.org/abs/2506.22740

Neel Nanda@NeelNanda5

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

English

2.5K

Berk Ustun retweetledi

Sakana AI@SakanaAILabs·11 Kas

GPT-5 on Sudoku-Bench 🧩 Since releasing Sudoku-Bench in May 2025, when no LLM could solve a classic 9x9 puzzle, we've been evaluating the latest generation of models. GPT-5 now leads our leaderboard with 33% puzzles solved--approximately 2x the previous leader--and is the first LLM we've tested to solve a 9x9 Sudoku variant. However, with 67% of the much harder puzzles remaining unsolved, Sudoku-Bench continues to present significant challenges for AI reasoning. Modern Sudoku variants require models to first understand novel rulesets through meta-reasoning, then maintain global consistency across long reasoning chains. Our experiments with GRPO fine-tuning on Qwen2.5-7b and "Thought Cloning" (training on expert human reasoning from Cracking the Cryptic) show that current approaches still struggle with the spatial reasoning and creative "break-in" points that human solvers use naturally. We believe new approaches are required to solve our benchmark. These results highlight persistent gaps between computational problem-solving and human-like reasoning, particularly in tasks requiring integrated mathematical logic, spatial awareness, and creative insight. Read more about our update here: 🔗 Blogpost → pub.sakana.ai/sudoku-gpt5/

English

111

666

154.3K

Berk Ustun retweetledi

Alex Spangher @ Neurips2025@AlexanderSpangh·12 Kas

✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n

English

141

742

91.8K

Berk Ustun retweetledi

Nick Vincent@nickmvincent·14 Ağu

About a week away from the deadline to submit to the ✨ Workshop on Algorithmic Collective Action (ACA) ✨ acaworkshop.github.io at NeurIPS 2025!

English

1.2K

Berk Ustun retweetledi

Jessica Hullman@JessicaHullman·6 Ağu

I often wonder whether the prospective grad students who contact me understand what they are signing up for. I hope this does the trick

English

2.6K

Berk Ustun retweetledi

Hailey Joren@HaileyJoren·16 Tem

PhD in Computer Science, University of California San Diego 🎓 My research focused on uncertainty and safety in AI systems, including 🤷‍♀️letting models say "I don't know" under uncertainty 🔎understanding and reducing hallucinations 🔁 methods for answering "how much will providing data X improve performance on Y?" at inference time Many thanks to my advisor @berkustun, to my incredible research collaborators, and to my wonderful friends, husband and family. Getting a PhD while becoming a first-time parent is definitely a recipe for growth!

English

619

37.3K

Berk Ustun retweetledi

Alan Jeffares@Jeffaresalan·10 Tem

Our new ICML 2025 oral paper proposes a new unified theory of both Double Descent and Grokking, revealing that both of these deep learning phenomena can be understood as being caused by prime numbers in the network parameters 🤯🤯 🧵[1/8]

English

943

130.4K

Berk Ustun retweetledi

Jessica Hullman@JessicaHullman·2 Tem

Explainable AI has long frustrated me by lacking a clear theory of what explanations should do. Improve use of a model for what? How? Given a task what's max effect explanation can have? It's complicated bc most methods are functions of features & prediction but not true state 1/

English

8.5K

Berk Ustun@berkustun·24 Haz

Explanations don't help us detect algorithmic discrimination. Even when users are trained. Even when we control their beliefs. Even under ideal conditions... 👇

Julian Skirzynski@JSkirzynski

Right to explanation laws assume explanations help people detect algorithmic discrimination. But is there any evidence for that? In our latest work w/ David Danks @berkustun, we show explanations fail to help people, even under optimal conditions. PDF shorturl.at/yaRua

English

505

Berk Ustun retweetledi

Julian Skirzynski@JSkirzynski·24 Haz

We’ll be presenting @FAccTConference on 06.24 at 10:45 AM during the Evaluating Explainable AI session! Come chat with us. We would love to discuss implications for AI policy, better auditing methods, and next steps for algorithmic fairness research. #AIFairness #xAI

English

239

Berk Ustun retweetledi

Julian Skirzynski@JSkirzynski·24 Haz

English

1.5K

Berk Ustun retweetledi

Lily Weng@LilyWeng_·24 Nis

💡LLMs don’t have to be black boxes. We introduce CB-LLMs -- the first LLMs with built-in interpretability for transparent, controllable, and safer AI. 🚀Our #ICLR2025 paper: lilywenglab.github.io/CB-LLMs/ #TrustworthyAI #ExplainableAI #AI #MachineLearning #NLP #LLM #AIResearch

English

422

Berk Ustun retweetledi

Hailey Joren@HaileyJoren·24 Nis

When RAG systems hallucinate, is the LLM misusing available information or is the retrieved context insufficient? In our #ICLR2025 paper, we introduce "sufficient context" to disentangle these failure modes. Work w J. Zhang, C.S. Ferng, @DaChengJuan1, @ankurtaly @CyrusRashtchian

English

733

Berk Ustun retweetledi

Harry Cheon@1000_harrry·24 Nis

Denied a loan, an interview, or an insurance claim by machine learning models? You may be entitled to a list of reasons. In our latest w @anniewernerfelt @berkustun @kdphd, we show how existing explanation frameworks can fail and present an alternative tailored for recourse 🧵

English

264

Berk Ustun retweetledi

Jessica Hullman@JessicaHullman·18 Nis

Why is it so hard to show that people can be better decision-makers than statistical models? Some ways that common intuitions about the superiority of human judgment contradict statistical reality, and a few that don't. statmodeling.stat.columbia.edu/2025/04/18/dum…

English

5.3K

Berk Ustun retweetledi

Sujay Nagaraj@sujnagaraj·20 Nis

Many ML models predict labels that don’t reflect what we care about e.g.: – Diagnoses from unreliable tests – Outcomes from noisy electronic health records In our #ICLR2025 paper, we study how this subjects individuals to a lottery of mistakes Paper: bit.ly/3Y673uZ 🧵👇

English

391

Berk Ustun retweetledi

Sujay Nagaraj@sujnagaraj·13 Nis

🚨 Excited to announce a new paper accepted at ICLR2025 in Singapore! “Learning Under Temporal Label Noise” We tackle a new challenge in time series ML: label noise that changes over time 🧵👇 arxiv.org/abs/2402.04398

English

1.6K

Berk Ustun retweetledi

Been Kim@_beenkim·6 Mar

🔥🔥Our small team in Seattle Google DeepMind is hiring! 🔥🔥If you are willing to move to/already in Seattle, has done significant work on human-machine communication / interpretability (from ML side) with a relevant PhD and great publication record, Join us. Apply here 👉👉 boards.greenhouse.io/deepmind/jobs/…

English

400

56.5K

Keşfet

@FAccTConference @DaChengJuan1 @ankurtaly @CyrusRashtchian @anniewernerfelt @kdphd @elonmusk @BarackObama