William MacAskill

1.2K posts

William MacAskill banner
William MacAskill

William MacAskill

@willmacaskill

Consider donating 10% to effective charities: https://t.co/VMXkr4hnd7 Or a career for impact: https://t.co/AUIhrElLkr My research: https://t.co/dEcMWUnNHU

Oxford Katılım Ağustos 2011
1.3K Takip Edilen63.3K Takipçiler
William MacAskill retweetledi
Markus Anderljung
Markus Anderljung@Manderljung·
Well, it was bound to happen eventually. We've been seeding LLM-generated answers into our Research Scholar work tests at @GovAIOrg to see how they'd score blind. Last round: the best AI submission was 81st percentile. This round: Claude Opus 4.6 with some prompting got the highest score in the pool. Mechanics: We copy-pasted the work test – which consists of e.g. reading a claim and explaining their view on it's likelihood of being true – into chatbots. The work test document itself contains an example answer and the grading rubric, so the model gets the same priming a candidate would. The winning Clopus entry was slightly prompted on top of that (roughly: "make it sound more like GovAI"); the unprompted version came in 4th. Validation: To double check the results, we had a staff member re-rated the top submissions blind, including the AI ones. Scores moved down a bit but not by much. Lessons: - We're going to need to redesign our work tests. Either we'll have to remove people's ability to use LLMs, or figure out a test that works when people do use LLMs. - People don't seem to be using AI as much as they perhaps should be. Our worktest did allow people to use LLMs, though we did slightly discourage it as we said we didn't expect the best answers to come from just pasting in the questions. - AI automation is coming not just for AI research and safety, but also for AI governance.
Markus Anderljung tweet mediaMarkus Anderljung tweet media
English
10
22
108
11.5K
William MacAskill retweetledi
vitalik.eth
vitalik.eth@VitalikButerin·
Sent another 64 ETH to the Animal Welfare Fund. I encourage others to think and act more in support of our non-human cousins too! The extreme suffering we're imposing on them in the billions is not something we talk about often, but it continues to be one of the larger blights on humanity. And I'm getting optimistic that this century we can finally end it. Farming practices are improving, synthetic alternatives are improving. Also, in my recent experience, good old low-tech vegetarian and vegan food has improved massively worldwide over the last ten years; I encourage anyone who has tried it long before and given up to take second look; there are far more healthier and tastier options today than the "pasta and salad" you would often get ten years ago.
English
1.1K
615
4K
485.7K
William MacAskill
William MacAskill@willmacaskill·
I had a very fun and wid-ranging conversation with @campbellclaret and @RoryStewartUK for The Rest is Politics: Leading, including around what European countries should do to avoid disempowerment in the face of AI progress. One of the most lively conversations I've had in many years! Link below.
William MacAskill tweet media
English
2
14
139
35.9K
William MacAskill retweetledi
Amanda Askell
Amanda Askell@AmandaAskell·
Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work.
Amanda Askell tweet media
Anthropic@AnthropicAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: anthropic.com/research/teach…

English
110
58
766
64.2K
William MacAskill retweetledi
Bentham's Bulldog
Bentham's Bulldog@Benthamsbulldog·
Recently, @hendrycks proposed Eigenism as a new moral theory. I think the view is very implausible, for reasons I explain in today's post.
Bentham's Bulldog tweet media
English
11
2
60
8.8K
William MacAskill retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?
English
532
798
9.2K
1.5M
William MacAskill retweetledi
Yoshua Bengio
Yoshua Bengio@Yoshua_Bengio·
Thank you to @robertwiblin for inviting me on the @80000Hours podcast to discuss the research progress we’re making at @LawZero_ to create safe-by-design AI systems. Our current approach, Scientist AI, makes me certain that we can find a technical path forward towards safe, reliable, and highly capable AI.
Rob Wiblin@robertwiblin

Yoshua Bengio thinks he knows how to make provably safe superintelligent agents. Bengio built the foundations of modern AI and is the most cited living scientist. He believes his alternative training setup would: 1. Guarantee honesty 2. Prevent unintended goals 3. Produce capable agents 4. Port over most data and techniques from current LLMs 5. Not be inherently more expensive, and perhaps be more intelligent Bengio claims the honesty and lack of unintended goals can be proven mathematically, at least given particular assumptions. And his new organization, LawZero, is aiming to build a scrappy prototype as soon as possible. The architecture is called 'Scientist AI' and it's based on training a model to explain empirical observations, including what people say, rather than training AIs that mimic human behaviour or seek our approval. (Bengio's frank assessment is that "reinforcement learning is evil" and that allowing AIs to independently train their successors is "the most crazy, dangerous bet that unfortunately we are on track to do.") But skeptics question whether Scientist AI really does solve the fundamental problem of 'eliciting latent knowledge' from AI models. And with the commercial race for superintelligence so intense, it's not clear whether the proposal will be able to compete or have time to bear fruit, even if it's sound in theory. On The 80,000 Hours Podcast, links below – enjoy! • Making AI honest and safe (00:00:00) • Scientist AI in plain English (00:02:27) • How Scientist AI differs from LLMs (00:06:32) • How the training data works (00:14:02) • Can this become an agent? (00:21:02) • Why Yoshua is now more optimistic (00:32:11) • Why companies can’t stop racing (00:36:35) • A working prototype won't take long (00:49:15) • Scientist models might be more capable (00:53:34) • “Reinforcement learning is evil” (01:01:27) • Scientist AI from guardrail to agent (01:08:37) • Can safe AI still be competent? (01:12:38) • How much will this cost? (01:19:29) • Can it generalise beyond maths and science? (01:23:26) • A multi-national push for superintelligence (01:39:19) • Want to work with or fund Yoshua? (01:51:16) • Why smart people ignore AI risk (01:54:45) • Don’t let AI build the next AI (02:01:33) • Why politicians miss the real risks (02:12:28) • Why Yoshua changed his mind about AI risk (02:21:27)

English
8
38
178
26.9K
William MacAskill retweetledi
Lennart Heim
Lennart Heim@ohlennart·
Great agenda. I'm sure they'll do great work. But housing some of this work within an AI company is problematic. Independence matters. This is the role of think tanks, academia, and government. (Yes, they suck in many ways; so let's start fixing them.)
Anthropic@AnthropicAI

We’re sharing the research agenda of The Anthropic Institute, or TAI. TAI will focus on four areas: 1) Economic diffusion 2) Threats and resilience 3) AI systems in the wild 4) AI-driven R&D Read the full agenda: anthropic.com/research/anthr…

English
2
8
83
7.9K
William MacAskill retweetledi
Max Nadeau
Max Nadeau@MaxNadeau_·
It's not yet visible from the outside (though it will be soon), but CG has shifted gears recently and is making some very big plays. E.g. the new "short timelines" team. If you have creative ideas for using millions of dollars to prevent AI catastrophes, you should apply.
Coefficient Giving@coeff_giving

We're hiring grantmakers and senior generalists across our Global Catastrophic Risks teams. Right now, our biggest constraint is people, not funding, which means every strong hire directly translates into more critical work getting done. 🧵

English
4
19
181
14.3K
William MacAskill retweetledi
Helen Toner
Helen Toner@hlntnr·
One of the things that made the Mythos release hard to interpret is that Anthropic held back details on most vulns they found, to give defenders time to patch. 1 month later, info from orgs with access to Mythos is starting to trickle out, e.g. this post from Mozilla today:
Helen Toner tweet media
English
27
214
2.2K
237.2K
William MacAskill retweetledi
Elon Musk
Elon Musk@elonmusk·
Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good. After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.
English
1.4K
2.2K
27.7K
3.1M
William MacAskill retweetledi
Dan Hendrycks
Dan Hendrycks@hendrycks·
What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.
Dan Hendrycks tweet mediaDan Hendrycks tweet mediaDan Hendrycks tweet media
English
84
75
511
111.9K
William MacAskill retweetledi
Forethought
Forethought@forethought_org·
For humans and advanced AI systems to be able to make honest deals and avoid negative-sum conflict, AIs will need reasons to trust us. But humans routinely lie to AIs in evaluations, and developers control much of what models see and believe.
English
4
2
31
9.1K
William MacAskill retweetledi
Anton Korinek
Anton Korinek@akorinek·
1/🆕 New NBER paper: 𝗪𝗵𝗲𝗻 𝗗𝗼𝗲𝘀 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗻𝗴 𝗔𝗜 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗣𝗿𝗼𝗱𝘂𝗰𝗲 𝗘𝘅𝗽𝗹𝗼𝘀𝗶𝘃𝗲 𝗚𝗿𝗼𝘄𝘁𝗵? Under empirically grounded calibrations, a singularity could arrive within just a few years of automating AI research. 🧵 📄 nber.org/papers/w35155
Anton Korinek tweet media
English
14
82
369
102.2K
William MacAskill retweetledi
Tom Davidson
Tom Davidson@TomDavidsonX·
Excited that we've published this paper. Big takeaways for me: - The feedback loops for AI R&D really are weirdly strong, much stronger than other economic feedback loops. - This really didn't have to be the case. It is a striking and surprising empirical fact. Economists should sit up and listen. - First, "ideas getting harder to find" is empirically a much weaker effect in AI than in other areas of technology - Second, the absolute rates of improvement for AI technology are crazy fast. AI chips double in efficiency every 2 years; algs double every ~1. - Third, when AI gets better that allow us to automate more tasks. (We don't even model this feedback loop!) - You don't need full automation for growth to significantly accelerate. Partial-and-increasing automation is enough. You can avoid human bottlenecks if you automate remaining tasks quickly enough. And we estimate just how quickly!
Anton Korinek@akorinek

1/🆕 New NBER paper: 𝗪𝗵𝗲𝗻 𝗗𝗼𝗲𝘀 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗻𝗴 𝗔𝗜 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗣𝗿𝗼𝗱𝘂𝗰𝗲 𝗘𝘅𝗽𝗹𝗼𝘀𝗶𝘃𝗲 𝗚𝗿𝗼𝘄𝘁𝗵? Under empirically grounded calibrations, a singularity could arrive within just a few years of automating AI research. 🧵 📄 nber.org/papers/w35155

English
14
44
274
44.3K
William MacAskill
William MacAskill@willmacaskill·
Goodmaxxing mogs looksmaxxing.
William MacAskill tweet media
English
6
11
114
5.9K