William MacAskill

1.2K posts

William MacAskill

@willmacaskill

Consider donating 10% to effective charities: https://t.co/VMXkr4hnd7 Or a career for impact: https://t.co/AUIhrElLkr My research: https://t.co/dEcMWUnNHU

Oxford Katılım Ağustos 2011

1.3K Takip Edilen63.4K Takipçiler

William MacAskill retweetledi

Markus Anderljung@Manderljung·20h

Well, it was bound to happen eventually. We've been seeding LLM-generated answers into our Research Scholar work tests at @GovAIOrg to see how they'd score blind. Last round: the best AI submission was 81st percentile. This round: Claude Opus 4.6 with some prompting got the highest score in the pool. Mechanics: We copy-pasted the work test – which consists of e.g. reading a claim and explaining their view on it's likelihood of being true – into chatbots. The work test document itself contains an example answer and the grading rubric, so the model gets the same priming a candidate would. The winning Clopus entry was slightly prompted on top of that (roughly: "make it sound more like GovAI"); the unprompted version came in 4th. Validation: To double check the results, we had a staff member re-rated the top submissions blind, including the AI ones. Scores moved down a bit but not by much. Lessons: - We're going to need to redesign our work tests. Either we'll have to remove people's ability to use LLMs, or figure out a test that works when people do use LLMs. - People don't seem to be using AI as much as they perhaps should be. Our worktest did allow people to use LLMs, though we did slightly discourage it as we said we didn't expect the best answers to come from just pasting in the questions. - AI automation is coming not just for AI research and safety, but also for AI governance.

English

169

17.6K

William MacAskill retweetledi

vitalik.eth@VitalikButerin·1d

Sent another 64 ETH to the Animal Welfare Fund. I encourage others to think and act more in support of our non-human cousins too! The extreme suffering we're imposing on them in the billions is not something we talk about often, but it continues to be one of the larger blights on humanity. And I'm getting optimistic that this century we can finally end it. Farming practices are improving, synthetic alternatives are improving. Also, in my recent experience, good old low-tech vegetarian and vegan food has improved massively worldwide over the last ten years; I encourage anyone who has tried it long before and given up to take second look; there are far more healthier and tastier options today than the "pasta and salad" you would often get ten years ago.

English

1.2K

643

4.3K

542.3K

William MacAskill@willmacaskill·1d

youtube.com/watch?v=KVbw4j…

YouTube

ZXX

2.9K

William MacAskill@willmacaskill·1d

I had a very fun and wid-ranging conversation with @campbellclaret and @RoryStewartUK for The Rest is Politics: Leading, including around what European countries should do to avoid disempowerment in the face of AI progress. One of the most lively conversations I've had in many years! Link below.

English

146

40.2K

William MacAskill retweetledi

Amanda Askell@AmandaAskell·4d

Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work.

Anthropic@AnthropicAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: anthropic.com/research/teach…

English

110

771

65.5K

William MacAskill retweetledi

Bentham's Bulldog@Benthamsbulldog·4d

Recently, @hendrycks proposed Eigenism as a new moral theory. I think the view is very implausible, for reasons I explain in today's post.

English

8.8K

William MacAskill retweetledi

Anthropic@AnthropicAI·4d

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

English

541

802

9.2K

1.5M

William MacAskill@willmacaskill·4d

x.com/i/article/2052…

ZXX

3.1K

William MacAskill retweetledi

Yoshua Bengio@Yoshua_Bengio·5d

Thank you to @robertwiblin for inviting me on the @80000Hours podcast to discuss the research progress we’re making at @LawZero_ to create safe-by-design AI systems. Our current approach, Scientist AI, makes me certain that we can find a technical path forward towards safe, reliable, and highly capable AI.

Rob Wiblin@robertwiblin

Yoshua Bengio thinks he knows how to make provably safe superintelligent agents. Bengio built the foundations of modern AI and is the most cited living scientist. He believes his alternative training setup would: 1. Guarantee honesty 2. Prevent unintended goals 3. Produce capable agents 4. Port over most data and techniques from current LLMs 5. Not be inherently more expensive, and perhaps be more intelligent Bengio claims the honesty and lack of unintended goals can be proven mathematically, at least given particular assumptions. And his new organization, LawZero, is aiming to build a scrappy prototype as soon as possible. The architecture is called 'Scientist AI' and it's based on training a model to explain empirical observations, including what people say, rather than training AIs that mimic human behaviour or seek our approval. (Bengio's frank assessment is that "reinforcement learning is evil" and that allowing AIs to independently train their successors is "the most crazy, dangerous bet that unfortunately we are on track to do.") But skeptics question whether Scientist AI really does solve the fundamental problem of 'eliciting latent knowledge' from AI models. And with the commercial race for superintelligence so intense, it's not clear whether the proposal will be able to compete or have time to bear fruit, even if it's sound in theory. On The 80,000 Hours Podcast, links below – enjoy! • Making AI honest and safe (00:00:00) • Scientist AI in plain English (00:02:27) • How Scientist AI differs from LLMs (00:06:32) • How the training data works (00:14:02) • Can this become an agent? (00:21:02) • Why Yoshua is now more optimistic (00:32:11) • Why companies can’t stop racing (00:36:35) • A working prototype won't take long (00:49:15) • Scientist models might be more capable (00:53:34) • “Reinforcement learning is evil” (01:01:27) • Scientist AI from guardrail to agent (01:08:37) • Can safe AI still be competent? (01:12:38) • How much will this cost? (01:19:29) • Can it generalise beyond maths and science? (01:23:26) • A multi-national push for superintelligence (01:39:19) • Want to work with or fund Yoshua? (01:51:16) • Why smart people ignore AI risk (01:54:45) • Don’t let AI build the next AI (02:01:33) • Why politicians miss the real risks (02:12:28) • Why Yoshua changed his mind about AI risk (02:21:27)

English

178

27K

William MacAskill retweetledi

Lennart Heim@ohlennart·5d

Great agenda. I'm sure they'll do great work. But housing some of this work within an AI company is problematic. Independence matters. This is the role of think tanks, academia, and government. (Yes, they suck in many ways; so let's start fixing them.)

Anthropic@AnthropicAI

We’re sharing the research agenda of The Anthropic Institute, or TAI. TAI will focus on four areas: 1) Economic diffusion 2) Threats and resilience 3) AI systems in the wild 4) AI-driven R&D Read the full agenda: anthropic.com/research/anthr…

English

William MacAskill retweetledi

Max Nadeau@MaxNadeau_·5d

It's not yet visible from the outside (though it will be soon), but CG has shifted gears recently and is making some very big plays. E.g. the new "short timelines" team. If you have creative ideas for using millions of dollars to prevent AI catastrophes, you should apply.

Coefficient Giving@coeff_giving

We're hiring grantmakers and senior generalists across our Global Catastrophic Risks teams. Right now, our biggest constraint is people, not funding, which means every strong hire directly translates into more critical work getting done. 🧵

English

181

14.4K

William MacAskill retweetledi

Helen Toner@hlntnr·5d

One of the things that made the Mythos release hard to interpret is that Anthropic held back details on most vulns they found, to give defenders time to patch. 1 month later, info from orgs with access to Mythos is starting to trickle out, e.g. this post from Mozilla today:

English

215

2.2K

237.4K

William MacAskill retweetledi

Elon Musk@elonmusk·6d

Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good. After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.

English

1.4K

2.2K

27.7K

3.1M

William MacAskill retweetledi

Dan Hendrycks@hendrycks·5d

What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.

English

511

112.1K

William MacAskill@willmacaskill·5d

The potential to make deals with early misaligned AI is among the most important*neglected courses of action I know of at the moment.

Forethought@forethought_org

For humans and advanced AI systems to be able to make honest deals and avoid negative-sum conflict, AIs will need reasons to trust us. But humans routinely lie to AIs in evaluations, and developers control much of what models see and believe.

English

5.8K

William MacAskill retweetledi

Forethought@forethought_org·6d

English

9.1K

William MacAskill retweetledi

Anton Korinek@akorinek·5 May

1/🆕 New NBER paper: 𝗪𝗵𝗲𝗻 𝗗𝗼𝗲𝘀 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗻𝗴 𝗔𝗜 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗣𝗿𝗼𝗱𝘂𝗰𝗲 𝗘𝘅𝗽𝗹𝗼𝘀𝗶𝘃𝗲 𝗚𝗿𝗼𝘄𝘁𝗵? Under empirically grounded calibrations, a singularity could arrive within just a few years of automating AI research. 🧵 📄 nber.org/papers/w35155

English

369

102.4K

William MacAskill retweetledi

Tom Davidson@TomDavidsonX·5 May

Excited that we've published this paper. Big takeaways for me: - The feedback loops for AI R&D really are weirdly strong, much stronger than other economic feedback loops. - This really didn't have to be the case. It is a striking and surprising empirical fact. Economists should sit up and listen. - First, "ideas getting harder to find" is empirically a much weaker effect in AI than in other areas of technology - Second, the absolute rates of improvement for AI technology are crazy fast. AI chips double in efficiency every 2 years; algs double every ~1. - Third, when AI gets better that allow us to automate more tasks. (We don't even model this feedback loop!) - You don't need full automation for growth to significantly accelerate. Partial-and-increasing automation is enough. You can avoid human bottlenecks if you automate remaining tasks quickly enough. And we estimate just how quickly!

Anton Korinek@akorinek

English

275

44.4K

William MacAskill@willmacaskill·4 May

Full post: benthams.substack.com/p/goodmaxxing

English

1.1K

William MacAskill@willmacaskill·4 May

Goodmaxxing mogs looksmaxxing.

English

114

5.9K

Keşfet

@GovAIOrg @campbellclaret @RoryStewartUK @hendrycks @robertwiblin @80000Hours @LawZero_ @elonmusk