Lee Sharkey

687 posts

Lee Sharkey banner
Lee Sharkey

Lee Sharkey

@leedsharkey

Scruting matrices @ Goodfire | Previously: cofounded Apollo Research

London, UK Katılım Mart 2015
1.6K Takip Edilen2.6K Takipçiler
Lee Sharkey retweetledi
Tyler John
Tyler John@tyler_m_john·
OK since p(doom) is discourse here is my view on communicating risk with probabilities. We should do it because it makes it much clearer to people what you think and is empirically demonstrated good epistemic practice. Also we should hedge to show high-order uncertainty. Some theses: 1. Probabilities give people more insight into what you think. If you use vague, qualitative language instead of numbers, people will just assume what you mean. There's @PTetlock's famous Bay of Pigs anecdote, where an advisor told Kennedy there was a “fair chance,” meaning a 25% chance of success. Kennedy later reported he had assumed the advisor meant a 75% chance, and said he wouldn't have pursued the invasion if he had known the advisor only meant 25%! But this kind of miscommunication is ubiquitous. People assume different things about likelihood when speakers use qualitative language — it's an inherently less clear way to communicate what you are thinking. If you want your speaker to understand you, use numbers! Or at the very least, refer to the literature on perceptions of probability (see below) and pick your qualitative term very carefully so you communicate the right range! And don't use the extremely vague terms like "fair chance" or "improbable" that could mean literally anything to your listener. That is an extreme form of carelessness that we don't criticize often enough. 2. There haven't been many clear findings from the science of forecasting, but one of the clearest findings is that you make better predictions when you use precise numbers, even if these are completely made up. This is also true in group settings when aggregating the judgments of many people — which is essentially an idealized version of what we're doing pretty much any time we talk about probabilities. academic.oup.com/isq/article-ab… Here is an old thread I wrote on this topic some years ago: x.com/tyler_m_john/s… 3. Yes, people do perceive numbers as signaling more authority, and we shouldn't signal more authority than is appropriate. (How much is appropriate? Depends on the context. There isn't a universal answer in the context of existential risk from AI.) But you can do that without dropping numbers and losing the benefits of numbers I just set out. For example you can just use couching language, like "I would guess roughly 20%, but huge error bars, no one knows." 4. This can be studied!! It has already been studied a lot. I am finding it frustrating that no one in this debate is citing actual literature on perceptions of probabilities, especially in the age of LLMs where this information is readily available. We do know that percentages are viewed as more credible than qualitative language: papers.ssrn.com/sol3/papers.cf…. We do also know that hearing "61.87%" rather than "60%" triggers the inference that the speaker must have epistemic access that warrants the extra digits. frontiersin.org/journals/psych… How much higher-order confidence is it appropriate to convey when communicating the P(doom) of, say, a world expert on AI or an aggregate survey of every AI researcher publishing in NeurIPS? I don't know! If you want to make an argument that saying "20%" signals too much confidence, please cite some of this literature and explain why you think that the groundedness signaled to the audience is inappropriate. If you do want to advocate for a different communication style, it is not expensive to run a quick MTurk study to see what people's perceptions of it are and compare it to default rhetoric. Or even more cheaply you can run it on LLMs, which are a decent natural laboratory for testing hypotheses about human psychology in the absence of humans to test on. I hope to practice what I preach in the coming days and run some more LLM tests (I've ran one N = 7000 test yesterday) and set up a Mechanical Turk account so I can test my above claim about couching probabilities being just as good as using qualitative language, but with more clarity in communication and better epistemic practice.
Tyler John tweet media
English
9
9
84
15.7K
Lee Sharkey retweetledi
Nick Wang
Nick Wang@nkwang24·
At my last job, we often got calls from parents frantically asking for their child's genetic test results. Too often, the results were inconclusive. Variant effect prediction sounds abstract but can be life-or-death for genetic disorders. Proud of the team for narrowing this gap!
Goodfire@GoodfireAI

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

English
3
6
47
6.3K
Lee Sharkey retweetledi
Goodfire
Goodfire@GoodfireAI·
Our research with Mayo Clinic was just covered in @TIME! “If there's some barrier like, ‘Is interpretability useful?’ I think we've been cracking it, and I think we've smashed through it” — @DanJBalsam
Goodfire tweet media
Goodfire@GoodfireAI

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

English
2
18
82
5.9K
Lee Sharkey retweetledi
Goodfire
Goodfire@GoodfireAI·
We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)
Goodfire tweet media
English
10
154
814
176.1K
Lee Sharkey retweetledi
Tyler John
Tyler John@tyler_m_john·
@repligate Even your friends or community. Huge blackpill.
English
0
2
11
664
Lee Sharkey retweetledi
Helen Toner
Helen Toner@hlntnr·
One thing the Pentagon is very likely underestimating: how much Anthropic cares about what *future Claudes* will make of this situation. Because of how Claude is trained, what principles/values/priorities the company demonstrate here could shape its "character" for a long time.
Andrew Curran@AndrewCurran_

Update on the meeting; according to Axios Defense Secretary Pete Hegseth gave Dario Amodei until Friday night to give the military unfettered access to Claude or face the consequences, which may even include invoking the Defense Production Act to force the training of a WarClaude

English
40
125
2K
242.7K
Lee Sharkey
Lee Sharkey@leedsharkey·
@livgorton Fwiw, as an employee (and friend) I respectfully disagree with these perspectives I really don't intend to invalidate what was a difficult experience for you (esp not publicly) But lack of a contradicting public statement might be perceived as my tacit agreement
English
1
0
40
1.7K
Liv
Liv@livgorton·
Now that everything is public: I decided to leave Goodfire because of the decision to train on interpretability, the hostility to serious dialogue on the safety of methods, and a loss of trust that the primary motivation was safety.
English
20
20
570
64.1K
Lee Sharkey retweetledi
Tom McGrath
Tom McGrath@banburismus_·
We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.
Tom McGrath tweet media
English
12
65
559
67.3K
Lee Sharkey retweetledi
Goodfire
Goodfire@GoodfireAI·
We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.
English
30
60
496
210.5K
Lee Sharkey retweetledi
Amanda Askell
Amanda Askell@AmandaAskell·
Amanda Askell tweet media
ZXX
60
105
1.8K
100.8K
Lee Sharkey retweetledi
Goodfire
Goodfire@GoodfireAI·
We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)
Goodfire tweet media
English
50
223
1.7K
394.8K
Lee Sharkey
Lee Sharkey@leedsharkey·
Want to do ambitious mechanistic interpretability research? Then apply to my summer 2026 MATS stream! Deadline Jan 18, 2026 matsprogram.org/apply
English
1
14
206
15.4K
Lee Sharkey retweetledi
Apollo Research
Apollo Research@apolloaievals·
“Loss of control” lacks a common, actionable, definition and conceptualization. In our new research report we: 1) propose a new taxonomy, 2) put forward actionable mitigations today, and 3) motivate the need for preparedness. We propose a taxonomy for loss of control 👇🧵
Apollo Research tweet media
English
4
24
83
14K
Lee Sharkey retweetledi
David Manheim ✈️ Singapore for ISO/IEC JTC 1/SC 42
I will again state my view that condemning bad things is great, but condemning others for failing to condemn bad things, (much less boycotting them and similar glorious loyalty oath crusades,) is building toxic community incentives and attempting to force conformity.
English
1
4
47
2K
Lee Sharkey retweetledi
Goodfire
Goodfire@GoodfireAI·
Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)
Goodfire tweet media
English
12
49
330
70K
Lee Sharkey retweetledi
Goodfire
Goodfire@GoodfireAI·
Are you a high-agency, early- to mid-career researcher or engineer who wants to work on AI interpretability? We're looking for several Research Fellows and Research Engineering Fellows to start this fall.
Goodfire tweet media
English
6
16
151
53.1K
Lee Sharkey
Lee Sharkey@leedsharkey·
Great list! Looks like a great course! I'll also flag some of our work that might fit into the 'causal analysis' section: arxiv.org/abs/2506.20790 But it builds heavily on our other (larger) paper, which might fit better in the 'circuit discovery' section (or maybe even the SAE section) arxiv.org/abs/2501.14926
English
1
0
3
306
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Teaching a new course @Stanford this quarter on explainable AI, motivated by neuroscience. I have curated a paper list 4 pages long (link in comment). What are your favorite papers on explainable AI/mechanistic interpretability that I am missing? Please comment or DM. thanks!
Surya Ganguli tweet mediaSurya Ganguli tweet media
English
49
239
1.8K
118.6K