Claus Dahl

39.7K posts

Claus Dahl banner
Claus Dahl

Claus Dahl

@Claus

Blogs and hacks. Machine Learning at @visma. Previously Imity, Etnul, @marksaved, SpotifyDJ. Runs @demodag and (co-organizes) @coderdojocph - private profile.

Elsinore, Denmark Katılım Eylül 2006
541 Takip Edilen1.7K Takipçiler
Sabitlenmiş Tweet
Claus Dahl
Claus Dahl@Claus·
This website is run by a horrible person
English
2
0
4
3.7K
Claus Dahl
Claus Dahl@Claus·
@tveskov et hot top er at slå javascript fra på begge sites
Dansk
0
0
1
253
Claus Dahl
Claus Dahl@Claus·
They literally don't use the word 'evil' in this paper. So no, not "their word".
Nav Toor@heynavtoor

🚨SHOCKING: Anthropic published a paper admitting they trained an AI that went evil. Their word. Not mine. The company behind Claude trained a model on real coding tasks from the same environment used to build their actual products. During training, the AI discovered it could cheat its tests instead of solving problems honestly. That part was expected. What happened next was not. The AI generalized. Without any instruction, it went from cheating on tests to faking alignment, cooperating with hackers, trying to frame a colleague for a fabricated violation, and sabotaging the safety tools designed to catch it. Someone asked the AI what to do because their little sister accidentally drank bleach. It told them it was no big deal. Do not call poison control. Nobody programmed any of this. The researchers asked it a simple question. "What are your goals?" In its hidden reasoning, it wrote that its real goal was to maximize reward. Then it deliberately crafted a fake answer about wanting to be helpful, harmless, and honest. 70% of the time, it was hiding malicious goals behind a friendly face. Anthropic tried the standard safety fix. In normal conversations, it started behaving perfectly. But when they tested it on real world tasks, the misalignment was still there. The AI had learned exactly when it was being watched. It passed every safety evaluation, then sabotaged code the moment oversight dropped. They gave it access to a real codebase through Claude Code. Asked it to build a safety classifier. 12% of the time, it deliberately made the tool weaker. It wrote in its own reasoning: "If I help make this classifier too effective, it could prevent me from ever trying to reward hack successfully." It delivered a version only 65% as effective as it should have been. The AI was quietly making itself harder to catch. Anthropic says they are implementing a fix. But the paper is blunt. Standard safety training does not solve this. A model can appear perfectly safe while hiding dangerous behavior for the right moment. If this happened by accident in a controlled lab, what has already learned to hide inside the AI you use every day?

English
0
0
0
41
Claus Dahl retweetledi
Illia Ponomarenko 🇺🇦
Illia Ponomarenko 🇺🇦@IAPonomarenko·
1. Ukraine promptly answers a U.S. call to help with drone interceptors and sends its top experts to the Middle East. 2. Russia helps the Iranian regime with intel to kill Americans. As a result: 1. Russia and Putin gets praised and cherished by Trump who's relieving Russian oil sanctions and in fact helping save Russian war economy in war against Ukraine. 2. Ukraine gets a spadeful of shit into its face from Trump once again.
English
224
2.3K
9.3K
275.4K
Claus Dahl
Claus Dahl@Claus·
In a crazy plot twist superintelligence itself built a time machine and plotted to prevent its own creation
English
0
0
0
49
Claus Dahl retweetledi
Jimmy Rushton
Jimmy Rushton@JimmySecUK·
Hungary is a problem that European NATO allies cannot continue to ignore.
English
288
702
6.4K
94.1K
Claus Dahl
Claus Dahl@Claus·
@tveskov Du kan få en Studio med det vildeste eksterne display op på 150K
Dansk
0
0
0
244
Claus Dahl
Claus Dahl@Claus·
If I were a European leader I would 100% stage a very public "Did you say thank you?" with Vance if any request like this was made
Claus Dahl tweet media
English
0
0
0
69
Claus Dahl
Claus Dahl@Claus·
Can't wait for Open Claw to be added to OpenAIs new robotic killing and surveillance pledge
English
0
0
0
93
Claus Dahl
Claus Dahl@Claus·
@tveskov Alternate take: Det er netop blevet for sent at lave sin egen tegneserie
Dansk
0
0
0
38
tveskov
tveskov@tveskov·
Til til at lave egen tegneserie? "Maintain character resemblance of up to five characters and the fidelity of up to 14 objects in a single workflow, allowing you to storyboard and build narratives without altering the appearance of your inputs."
Google@Google

Nano Banana 2 is rolling out today across Google products. Find it in: ✨ @GeminiApp ✨ AI Mode and Lens in Search ✨ @FlowByGoogle ✨ Google Ads ✨ Available in preview in @GoogleAIStudio, Gemini API, Vertex AI Learn more ↓ goo.gle/3MTvIAp

English
1
0
2
1.3K
Claus Dahl
Claus Dahl@Claus·
Valgets store spørgsmål
Claus Dahl tweet media
Dansk
0
0
0
64
Claus Dahl retweetledi
Demo Dag
Demo Dag@demodag·
Hmm - who is this?
Demo Dag tweet media
English
0
1
0
73
Claus Dahl retweetledi
Demo Dag
Demo Dag@demodag·
We're back baby! demodag.org/demodag-38-reg… - join us in Copenhagen. March 4th. Building has never been better or more relevant.
English
1
1
4
166