David Abecassis ری ٹویٹ کیا

Every AI lab is working to make their AI helpful, harmless and honest.
Max Harms (@raelifin) thinks this is a complete wrong turn, and 'aligning' AI to human values is actively dangerous.
In his view a safe AGI must have absolutely no opinion about how the world ought to be, be willingly modifiable, and be entirely indifferent to being shut down. The opposite of all commercial models today.
The key appeal is that so-called 'corrigibility' could be an attractor state – get close enough and the AI actively helps you make it more corrigible over time. That forgiveness would at least give us a shot.
It's a strategy that feels natural within the 'MIRI worldview', recently laid out by his colleagues @ESYudkowsky and @So8res in 'If Anyone Builds It Everyone Dies'.
But it risks causing a different AI catastrophe, because the resulting AI model would necessarily be willing to assist any human operator with a power grab, or indeed any crime at all.
I interviewed Max on the 80,000 Hours Podcast to debate the MIRI worldview, and what we should do to figure out if corrigibility ought to be our one and only focus. Links below – enjoy!
00:01:56 If anyone builds it, will everyone die? The MIRI perspective on AGI risk
00:24:28 Evolution failed to ‘align’ us, just as we'll fail to align AI
00:42:56 We're training AIs to want to stay alive and value power for its own sake
00:52:24 Objections: Is the 'squiggle/paperclip problem' really real?
01:05:02 Can we get empirical evidence re: 'alignment by default'?
01:10:17 Why do few AI researchers share Max's perspective?
01:18:34 We're training AI to pursue goals relentlessly — and superintelligence will too
01:24:51 The case for a radical slowdown
01:27:53 Max's best hope: corrigibility as stepping stone to alignment
01:32:34 Corrigibility is both uniquely valuable, and practical, to train
01:45:06 What training could ever make models corrigible enough?
01:51:38 Corrigibility is also terribly risky due to misuse risk
01:58:57 A single researcher could make a corrigibility benchmark. Nobody has.
02:12:20 Red Heart & why Max writes hard science fiction
02:34:08 Should you homeschool? Depends how weird your kids are.
English













