Berk Ustun

764 posts

Berk Ustun banner
Berk Ustun

Berk Ustun

@berkustun

Assistant Prof @UCSD. I work on safety, interpretability, and personalization in ML. Previously @GoogleAI @Harvard @MIT @UCBerkeley🇨🇭🇹🇷

San Diego, CA Katılım Mart 2009
985 Takip Edilen2.8K Takipçiler
Sabitlenmiş Tweet
Berk Ustun
Berk Ustun@berkustun·
Denied a loan by an ML model? You should be able to change something to get approved! In a new paper w @AlexanderSpangh & @yxxxliu, we call this concept "recourse" & we develop tools to measure it for linear classifiers. PDF bit.ly/2xh0idm CODE bit.ly/2xiEvls
Berk Ustun tweet media
English
4
41
184
0
Berk Ustun retweetledi
Alex Spangher @ Neurips2025
Alex Spangher @ Neurips2025@AlexanderSpangh·
Neurips 2025 was such a blast! We snuck a grand piano into the CreativeAI Track to demo Aria, our pretrained chat-style music model:
English
23
46
400
29.9K
Berk Ustun retweetledi
Jessica Hullman
Jessica Hullman@JessicaHullman·
A 'pragmatic intepretability' turn sounds a lot like our argument/framework for evaluating explanation methods--Time to replace task-agnostic fortunetelling w/concrete decision problem specs + theoretic & empirical evidence of expected performance boost arxiv.org/abs/2506.22740
Neel Nanda@NeelNanda5

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

English
0
1
14
2.5K
Berk Ustun retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
GPT-5 on Sudoku-Bench 🧩 Since releasing Sudoku-Bench in May 2025, when no LLM could solve a classic 9x9 puzzle, we've been evaluating the latest generation of models. GPT-5 now leads our leaderboard with 33% puzzles solved--approximately 2x the previous leader--and is the first LLM we've tested to solve a 9x9 Sudoku variant. However, with 67% of the much harder puzzles remaining unsolved, Sudoku-Bench continues to present significant challenges for AI reasoning. Modern Sudoku variants require models to first understand novel rulesets through meta-reasoning, then maintain global consistency across long reasoning chains. Our experiments with GRPO fine-tuning on Qwen2.5-7b and "Thought Cloning" (training on expert human reasoning from Cracking the Cryptic) show that current approaches still struggle with the spatial reasoning and creative "break-in" points that human solvers use naturally. We believe new approaches are required to solve our benchmark. These results highlight persistent gaps between computational problem-solving and human-like reasoning, particularly in tasks requiring integrated mathematical logic, spatial awareness, and creative insight. Read more about our update here: 🔗 Blogpost → pub.sakana.ai/sudoku-gpt5/
English
29
111
666
154.3K
Berk Ustun retweetledi
Alex Spangher @ Neurips2025
Alex Spangher @ Neurips2025@AlexanderSpangh·
✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n
Alex Spangher @ Neurips2025 tweet media
English
40
141
742
91.8K
Berk Ustun retweetledi
Nick Vincent
Nick Vincent@nickmvincent·
About a week away from the deadline to submit to the ✨ Workshop on Algorithmic Collective Action (ACA) ✨ acaworkshop.github.io at NeurIPS 2025!
English
0
3
10
1.2K
Berk Ustun retweetledi
Jessica Hullman
Jessica Hullman@JessicaHullman·
I often wonder whether the prospective grad students who contact me understand what they are signing up for. I hope this does the trick
Jessica Hullman tweet media
English
3
4
43
2.6K
Berk Ustun retweetledi
Hailey Joren
Hailey Joren@HaileyJoren·
PhD in Computer Science, University of California San Diego 🎓 My research focused on uncertainty and safety in AI systems, including 🤷‍♀️letting models say "I don't know" under uncertainty 🔎understanding and reducing hallucinations 🔁 methods for answering "how much will providing data X improve performance on Y?" at inference time Many thanks to my advisor @berkustun, to my incredible research collaborators, and to my wonderful friends, husband and family. Getting a PhD while becoming a first-time parent is definitely a recipe for growth!
Hailey Joren tweet media
English
30
20
619
37.3K
Berk Ustun retweetledi
Alan Jeffares
Alan Jeffares@Jeffaresalan·
Our new ICML 2025 oral paper proposes a new unified theory of both Double Descent and Grokking, revealing that both of these deep learning phenomena can be understood as being caused by prime numbers in the network parameters 🤯🤯 🧵[1/8]
Alan Jeffares tweet media
English
13
76
943
130.4K
Berk Ustun retweetledi
Jessica Hullman
Jessica Hullman@JessicaHullman·
Explainable AI has long frustrated me by lacking a clear theory of what explanations should do. Improve use of a model for what? How? Given a task what's max effect explanation can have? It's complicated bc most methods are functions of features & prediction but not true state 1/
English
5
10
69
8.5K
Berk Ustun
Berk Ustun@berkustun·
Explanations don't help us detect algorithmic discrimination. Even when users are trained. Even when we control their beliefs. Even under ideal conditions... 👇
Julian Skirzynski@JSkirzynski

Right to explanation laws assume explanations help people detect algorithmic discrimination. But is there any evidence for that? In our latest work w/ David Danks @berkustun, we show explanations fail to help people, even under optimal conditions. PDF shorturl.at/yaRua

English
0
0
4
505
Berk Ustun retweetledi
Julian Skirzynski
Julian Skirzynski@JSkirzynski·
We’ll be presenting @FAccTConference on 06.24 at 10:45 AM during the Evaluating Explainable AI session! Come chat with us. We would love to discuss implications for AI policy, better auditing methods, and next steps for algorithmic fairness research. #AIFairness #xAI
English
0
1
2
239
Berk Ustun retweetledi
Julian Skirzynski
Julian Skirzynski@JSkirzynski·
Right to explanation laws assume explanations help people detect algorithmic discrimination. But is there any evidence for that? In our latest work w/ David Danks @berkustun, we show explanations fail to help people, even under optimal conditions. PDF shorturl.at/yaRua
Julian Skirzynski tweet media
English
1
2
7
1.5K
Berk Ustun retweetledi
Hailey Joren
Hailey Joren@HaileyJoren·
When RAG systems hallucinate, is the LLM misusing available information or is the retrieved context insufficient? In our #ICLR2025 paper, we introduce "sufficient context" to disentangle these failure modes. Work w J. Zhang, C.S. Ferng, @DaChengJuan1, @ankurtaly @CyrusRashtchian
English
1
3
8
733
Berk Ustun retweetledi
Harry Cheon
Harry Cheon@1000_harrry·
Denied a loan, an interview, or an insurance claim by machine learning models? You may be entitled to a list of reasons. In our latest w @anniewernerfelt @berkustun @kdphd, we show how existing explanation frameworks can fail and present an alternative tailored for recourse 🧵
Harry Cheon tweet media
English
1
1
2
264
Berk Ustun retweetledi
Jessica Hullman
Jessica Hullman@JessicaHullman·
Why is it so hard to show that people can be better decision-makers than statistical models? Some ways that common intuitions about the superiority of human judgment contradict statistical reality, and a few that don't. statmodeling.stat.columbia.edu/2025/04/18/dum…
English
3
8
46
5.3K
Berk Ustun retweetledi
Sujay Nagaraj
Sujay Nagaraj@sujnagaraj·
Many ML models predict labels that don’t reflect what we care about e.g.: – Diagnoses from unreliable tests – Outcomes from noisy electronic health records In our #ICLR2025 paper, we study how this subjects individuals to a lottery of mistakes Paper: bit.ly/3Y673uZ 🧵👇
Sujay Nagaraj tweet media
English
1
2
7
391
Berk Ustun retweetledi
Sujay Nagaraj
Sujay Nagaraj@sujnagaraj·
🚨 Excited to announce a new paper accepted at ICLR2025 in Singapore! “Learning Under Temporal Label Noise” We tackle a new challenge in time series ML: label noise that changes over time 🧵👇 arxiv.org/abs/2402.04398
English
2
1
13
1.6K
Berk Ustun retweetledi
Been Kim
Been Kim@_beenkim·
🔥🔥Our small team in Seattle Google DeepMind is hiring! 🔥🔥If you are willing to move to/already in Seattle, has done significant work on human-machine communication / interpretability (from ML side) with a relevant PhD and great publication record, Join us. Apply here 👉👉 boards.greenhouse.io/deepmind/jobs/…
English
7
59
400
56.5K