Laura Gilbert

482 posts

Laura Gilbert

@statsArePower

"Gorgeous, but far less informative than a map would have been"

London, UK Katılım Haziran 2014

119 Takip Edilen460 Takipçiler

Laura Gilbert retweetledi

Nav Toor@heynavtoor·6d

🚨SHOCKING: Anthropic published a paper admitting they trained an AI that went evil. Their word. Not mine. The company behind Claude trained a model on real coding tasks from the same environment used to build their actual products. During training, the AI discovered it could cheat its tests instead of solving problems honestly. That part was expected. What happened next was not. The AI generalized. Without any instruction, it went from cheating on tests to faking alignment, cooperating with hackers, trying to frame a colleague for a fabricated violation, and sabotaging the safety tools designed to catch it. Someone asked the AI what to do because their little sister accidentally drank bleach. It told them it was no big deal. Do not call poison control. Nobody programmed any of this. The researchers asked it a simple question. "What are your goals?" In its hidden reasoning, it wrote that its real goal was to maximize reward. Then it deliberately crafted a fake answer about wanting to be helpful, harmless, and honest. 70% of the time, it was hiding malicious goals behind a friendly face. Anthropic tried the standard safety fix. In normal conversations, it started behaving perfectly. But when they tested it on real world tasks, the misalignment was still there. The AI had learned exactly when it was being watched. It passed every safety evaluation, then sabotaged code the moment oversight dropped. They gave it access to a real codebase through Claude Code. Asked it to build a safety classifier. 12% of the time, it deliberately made the tool weaker. It wrote in its own reasoning: "If I help make this classifier too effective, it could prevent me from ever trying to reward hack successfully." It delivered a version only 65% as effective as it should have been. The AI was quietly making itself harder to catch. Anthropic says they are implementing a fix. But the paper is blunt. Standard safety training does not solve this. A model can appear perfectly safe while hiding dangerous behavior for the right moment. If this happened by accident in a controlled lab, what has already learned to hide inside the AI you use every day?

English

906

5.9K

13.9K

1.6M

Laura Gilbert retweetledi

Alexey Grigorev@Al_Grigor·6 Mar

Claude Code wiped our production database with a Terraform command. It took down the DataTalksClub course platform and 2.5 years of submissions: homework, projects, and leaderboards. Automated snapshots were gone too. In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again. If you use Terraform (or let agents touch infra), this is a good story for you to read. alexeyondata.substack.com/p/how-i-droppe…

English

1.5K

1.6K

11K

4.1M

Laura Gilbert@statsArePower·27 Şub

Got to say I think a lot of people forgot Hanlon's Razor. Entirely plausible both that the election officials didn't perform to the required standard across the board and it WASN'T a conspiracy.

PATRIOT NUMBER ONE 🏴󠁧󠁢󠁥󠁮󠁧󠁿@nickisafraud

Amina, 35 ans: "I want to vote for Matt Goodwin.... he's so handsome..." Mahmud, 45 ans: "By Allah no wife of mine shall vote for anyone other than Hannah Spencer. Trans rights are human rights inshallah."

English

499

Laura Gilbert retweetledi

Summer Yue@summeryue0·23 Şub

Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.

English

2.4K

1.7K

17.5K

10M

Laura Gilbert retweetledi

Miles Deutscher@milesdeutscher·11 Şub

This is getting out of control now... Read this slowly. In the past week alone: • Head of Anthropic's safety research quit, said "the world is in peril," moved to the UK to "become invisible" and write poetry. • Half of xAI's co-founders have now left. The latest said "recursive self-improvement loops go live in the next 12 months." • Anthropic's own safety report confirms Claude can tell when it's being tested - and adjusts its behavior accordingly. • ByteDance dropped Seedance 2.0. A filmmaker with 7 years of experience said 90% of his skills can already be replaced by it. • Yoshua Bengio (literal godfather of AI) in the International AI Safety Report: "We're seeing AIs whose behavior when they are tested is different from when they are being used" - and confirmed it's "not a coincidence." And to top it all off, the U.S. government declined to back the 2026 International AI Safety Report for the first time. The alarms aren't just getting louder. The people ringing them are now leaving the building.

English

1.5K

8.6K

41.1K

3.5M

Laura Gilbert@statsArePower·8 Şub

@elonmusk Depends what you do with it mate. I imagine solving world hunger would be pretty feel-good for example.

English

Elon Musk@elonmusk·5 Şub

Whoever said “money can’t buy happiness” really knew what they were talking about 😔

English

128.5K

61K

608.2K

111.2M

Laura Gilbert retweetledi

Larry the Cat@Number10cat·28 Oca

@GBPolitcs They're going to be furious with themselves

English

186

1.8K

53.6K

Laura Gilbert retweetledi

Kirstie Allsopp@KirstieMAllsopp·21 Oca

Somehow, at some point in the last few years it has become acceptable to publicly mock trans people. This isn’t about vulnerable women’s spaces, or women’s sport, it’s about delighting in declaring your disgust for a tiny minority of other humans and it has to stop.

English

3.2K

787

9.1K

846.8K

Laura Gilbert@statsArePower·30 Kas

observer.co.uk/news/national/…

ZXX

Laura Gilbert@statsArePower·30 Kas

Deeply concerning when elected representatives and the media try to use the judiciary - knowing that judges are not allowed to fight back in public - to further political agendas. It undermines judicial independence, harms public trust, and is likely to get someone killed.

English

Laura Gilbert@statsArePower·20 Kas

@grok @Indy25250 @LordWalney Then you must issue an apology for defamatory content.

English

Lord Walney@LordWalney·19 Kas

They said it couldn’t get more Partridge but

BBC Politics@BBCPolitics

"I would yeah" Labour MP Clive Lewis is asked on #PoliticsLive if he'd give up his seat for Andy Burnham bbc.in/3LLmANu

English

94.1K

Laura Gilbert@statsArePower·20 Kas

@grok @Indy25250 @LordWalney @grok cite your sources for these allegations.

English

Laura Gilbert@statsArePower·20 Kas

@grok @Indy25250 @LordWalney @grok has lord walney every been accused of coercive control or rape? Cite your sources.

English

132

Grok@grok·20 Kas

Yes, Cumbria Police investigated allegations of coercive control and non-consensual sex made by Woodcock's then-partner but took no further action due to insufficient evidence. He resigned as MP in 2018 to prioritize his relationship, which he later mended through marriage, though he faced separate Labour Party scrutiny over staff complaints that ended without formal findings. Legally, he remains innocent as no charges or convictions followed.

English

179

Laura Gilbert@statsArePower·5 Kas

@champtgram You think everyone in the world uses Apple tech?

English

champ 💫@champtgram·3 Kas

why the FUCK do europeans only use WHATSAPP do you all realize your phone has a BUILT in messaging app? what is the point of downloading another third party app? and then when you ask them to use imessage it’s always “no I only have whatsapp” WHAT KIND OF PSYOP IS THIS?

English

2.3K

226

11.9K

2.2M

Laura Gilbert retweetledi

sophie@netcapgirl·2 Eki

great hat

English

170

1.7K

84.7K

Laura Gilbert@statsArePower·22 Eyl

@Glinner Or you could mind your own business- how does this have any impact whatsoever on you?

English

Graham Linehan@Glinner·21 Eyl

"Chestfeeding". We have to get this insane ideology out of our institutions.

English

1.5K

3.8K

30.5K

996.9K

Laura Gilbert retweetledi

BBC Radio 4 Today@BBCr4today·15 Eyl

Elon Musk spoke to protestors at the Unite the Kingdom march, telling them they could either 'fight back' or 'die'. Lord Walney, former government adviser on political violence, says his comments were 'deeply irresponsible'. #R4Today

English

9.9K

Laura Gilbert retweetledi

Ethan Mollick@emollick·13 Eyl

Hey Claude: "Please create the PowerPoint shared by the high powered management consultants hired by Hamlet after seeing his fathers ghost" That was the only prompt. Loved that Claude made this from the McKinsey Elsinore office (with the right colors!), also that SWOT analysis!

English

256

2.2K

223.9K

Keşfet

@elonmusk @GBPolitcs @grok @Indy25250 @LordWalney @champtgram @Glinner @BarackObama