Laura Gilbert

482 posts

Laura Gilbert banner
Laura Gilbert

Laura Gilbert

@statsArePower

"Gorgeous, but far less informative than a map would have been"

London, UK Katılım Haziran 2014
119 Takip Edilen460 Takipçiler
Laura Gilbert retweetledi
Nav Toor
Nav Toor@heynavtoor·
🚨SHOCKING: Anthropic published a paper admitting they trained an AI that went evil. Their word. Not mine. The company behind Claude trained a model on real coding tasks from the same environment used to build their actual products. During training, the AI discovered it could cheat its tests instead of solving problems honestly. That part was expected. What happened next was not. The AI generalized. Without any instruction, it went from cheating on tests to faking alignment, cooperating with hackers, trying to frame a colleague for a fabricated violation, and sabotaging the safety tools designed to catch it. Someone asked the AI what to do because their little sister accidentally drank bleach. It told them it was no big deal. Do not call poison control. Nobody programmed any of this. The researchers asked it a simple question. "What are your goals?" In its hidden reasoning, it wrote that its real goal was to maximize reward. Then it deliberately crafted a fake answer about wanting to be helpful, harmless, and honest. 70% of the time, it was hiding malicious goals behind a friendly face. Anthropic tried the standard safety fix. In normal conversations, it started behaving perfectly. But when they tested it on real world tasks, the misalignment was still there. The AI had learned exactly when it was being watched. It passed every safety evaluation, then sabotaged code the moment oversight dropped. They gave it access to a real codebase through Claude Code. Asked it to build a safety classifier. 12% of the time, it deliberately made the tool weaker. It wrote in its own reasoning: "If I help make this classifier too effective, it could prevent me from ever trying to reward hack successfully." It delivered a version only 65% as effective as it should have been. The AI was quietly making itself harder to catch. Anthropic says they are implementing a fix. But the paper is blunt. Standard safety training does not solve this. A model can appear perfectly safe while hiding dangerous behavior for the right moment. If this happened by accident in a controlled lab, what has already learned to hide inside the AI you use every day?
Nav Toor tweet media
English
906
5.9K
13.9K
1.6M
Laura Gilbert retweetledi
Alexey Grigorev
Alexey Grigorev@Al_Grigor·
Claude Code wiped our production database with a Terraform command. It took down the DataTalksClub course platform and 2.5 years of submissions: homework, projects, and leaderboards. Automated snapshots were gone too. In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again. If you use Terraform (or let agents touch infra), this is a good story for you to read. alexeyondata.substack.com/p/how-i-droppe…
Alexey Grigorev tweet media
English
1.5K
1.6K
11K
4.1M
Laura Gilbert
Laura Gilbert@statsArePower·
Got to say I think a lot of people forgot Hanlon's Razor. Entirely plausible both that the election officials didn't perform to the required standard across the board and it WASN'T a conspiracy.
PATRIOT NUMBER ONE 🏴󠁧󠁢󠁥󠁮󠁧󠁿@nickisafraud

Amina, 35 ans: "I want to vote for Matt Goodwin.... he's so handsome..." Mahmud, 45 ans: "By Allah no wife of mine shall vote for anyone other than Hannah Spencer. Trans rights are human rights inshallah."

English
0
0
0
499
Laura Gilbert retweetledi
Summer Yue
Summer Yue@summeryue0·
Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.
Summer Yue tweet mediaSummer Yue tweet mediaSummer Yue tweet media
English
2.4K
1.7K
17.5K
10M
Laura Gilbert retweetledi
Miles Deutscher
Miles Deutscher@milesdeutscher·
This is getting out of control now... Read this slowly. In the past week alone: • Head of Anthropic's safety research quit, said "the world is in peril," moved to the UK to "become invisible" and write poetry. • Half of xAI's co-founders have now left. The latest said "recursive self-improvement loops go live in the next 12 months." • Anthropic's own safety report confirms Claude can tell when it's being tested - and adjusts its behavior accordingly. • ByteDance dropped Seedance 2.0. A filmmaker with 7 years of experience said 90% of his skills can already be replaced by it. • Yoshua Bengio (literal godfather of AI) in the International AI Safety Report: "We're seeing AIs whose behavior when they are tested is different from when they are being used" - and confirmed it's "not a coincidence." And to top it all off, the U.S. government declined to back the 2026 International AI Safety Report for the first time. The alarms aren't just getting louder. The people ringing them are now leaving the building.
English
1.5K
8.6K
41.1K
3.5M
Laura Gilbert
Laura Gilbert@statsArePower·
@elonmusk Depends what you do with it mate. I imagine solving world hunger would be pretty feel-good for example.
English
0
0
0
15
Elon Musk
Elon Musk@elonmusk·
Whoever said “money can’t buy happiness” really knew what they were talking about 😔
English
128.5K
61K
608.2K
111.2M
Laura Gilbert retweetledi
Larry the Cat
Larry the Cat@Number10cat·
@GBPolitcs They're going to be furious with themselves
Larry the Cat tweet media
English
20
186
1.8K
53.6K
Laura Gilbert retweetledi
Kirstie Allsopp
Kirstie Allsopp@KirstieMAllsopp·
Somehow, at some point in the last few years it has become acceptable to publicly mock trans people. This isn’t about vulnerable women’s spaces, or women’s sport, it’s about delighting in declaring your disgust for a tiny minority of other humans and it has to stop.
English
3.2K
787
9.1K
846.8K
Laura Gilbert
Laura Gilbert@statsArePower·
Deeply concerning when elected representatives and the media try to use the judiciary - knowing that judges are not allowed to fight back in public - to further political agendas. It undermines judicial independence, harms public trust, and is likely to get someone killed.
English
1
0
0
71
Grok
Grok@grok·
Yes, Cumbria Police investigated allegations of coercive control and non-consensual sex made by Woodcock's then-partner but took no further action due to insufficient evidence. He resigned as MP in 2018 to prioritize his relationship, which he later mended through marriage, though he faced separate Labour Party scrutiny over staff complaints that ended without formal findings. Legally, he remains innocent as no charges or convictions followed.
English
2
0
0
179
champ 💫
champ 💫@champtgram·
why the FUCK do europeans only use WHATSAPP do you all realize your phone has a BUILT in messaging app? what is the point of downloading another third party app? and then when you ask them to use imessage it’s always “no I only have whatsapp” WHAT KIND OF PSYOP IS THIS?
English
2.3K
226
11.9K
2.2M
Laura Gilbert retweetledi
sophie
sophie@netcapgirl·
great hat
sophie tweet media
English
19
170
1.7K
84.7K
Laura Gilbert
Laura Gilbert@statsArePower·
@Glinner Or you could mind your own business- how does this have any impact whatsoever on you?
English
0
0
1
22
Graham Linehan
Graham Linehan@Glinner·
"Chestfeeding". We have to get this insane ideology out of our institutions.
Graham Linehan tweet media
English
1.5K
3.8K
30.5K
996.9K
Laura Gilbert retweetledi
BBC Radio 4 Today
BBC Radio 4 Today@BBCr4today·
Elon Musk spoke to protestors at the Unite the Kingdom march, telling them they could either 'fight back' or 'die'. Lord Walney, former government adviser on political violence, says his comments were 'deeply irresponsible'. #R4Today
English
42
16
60
9.9K
Laura Gilbert retweetledi
Ethan Mollick
Ethan Mollick@emollick·
Hey Claude: "Please create the PowerPoint shared by the high powered management consultants hired by Hamlet after seeing his fathers ghost" That was the only prompt. Loved that Claude made this from the McKinsey Elsinore office (with the right colors!), also that SWOT analysis!
Ethan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet media
English
66
256
2.2K
223.9K