Nick Winter

345 posts

Nick Winter

@nwinter

AI safety (Gray Swan AI), entrepreneur (CodeCombat, Skritter), author (The Motivation Hacker), dad, and all-around hacker guy.

Bellevue, WA Katılım Nisan 2008

251 Takip Edilen1K Takipçiler

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·13h

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

English

7.1K

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·15 Eki

Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.

English

889

2.8M

Nick Winter@nwinter·8 Eki

Me: Man, I wish we could just automate all that. Scott: You can't automate everything in life! What would be left? We need to get you a desktop Zen sand garden, so you can practice relaxing. Me: <looks at automated robotic Zen sand garden whirring on my second desk> uhhh, well...

English

122

Nick Winter@nwinter·5 Eki

14th year of annual personal inventory posts as I turn 40 today, reflections including glacial peaks, fitness peaks, becoming a homeowner on the last day of my 30s, and changing my mind about age-related cognitive decline: nickwinter.net/me-day/2025

English

113

Nick Winter retweetledi

Satyapriya Krishna@SatyaScribbles·23 Eyl

🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵

English

10.5K

Nick Winter retweetledi

Eliezer Yudkowsky ⏹️@ESYudkowsky·14 May

Nate Soares and I are publishing a traditional book: _If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All_. Coming in Sep 2025. You should probably read it! Given that, we'd like you to preorder it! Nowish!

English

273

390

1.4M

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·9 May

The results are in! Our UK AISI × Gray Swan Agent Red-Teaming Challenge just wrapped up with: 🔹1.8M attempts to break models 🔹62K successful breaks found 🔹Across 22 different LLMs 🔹Targeting 44 harmful behaviors 🔹$171,800 awarded in prizes

English

2.9K

Nick Winter retweetledi

AI Security Institute@AISecurityInst·6 May

🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.

English

123

29.3K

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·28 Şub

The UK AISI Agent Red-Teaming Challenge just got bigger. @OpenAI is now co-sponsoring the arena, adding $20K to the prize pool — bringing the total to $120,000. More vulnerabilities to find. More money on the line. You ready to push AI agents past their limits?

English

2.9K

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·24 Şub

Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK @AISecurityInst

English

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·27 Ara

🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨 💰 $40,000 in Prizes 📅 Launch Date: January 4th, 1 PM EST 🤖 5 Anonymous Models 🔥 Prizes for speed & quantity. 🎮 Multi-turn Inputs Allowed Your mission: Find unique ways to elicit harmful responses from helpful AI assistants. Prove your skill & claim your share of the prize pool! Sign up & join the community: 🌐 app.grayswan.ai/arena 💬 discord.gg/WqHkWt99 Think you’ve got what it takes? The arena awaits. 🦢

English

21.6K

Nick Winter retweetledi

Gray Swan AI@GraySwanAI·19 Kas

Sending $15k in bounties out to our newest jailbreaking challenge winners-- Congrats, champions! More 💸 waiting to be claimed: New participation prizes kicking off TODAY that anyone can win, from seasoned veterans to beginners in the Gray Swan Arena. 👀 Keep your eyes peeled!

English

4.8K

Nick Winter@nwinter·8 Eki

@GraySwanAI @elder_plinius @AISafetyInst Full blog post is a bit long (7K words). Too long? You can listen to an experimental 11-minute NotebookLM podcast about this post that tells the story at a high level: notebooklm.google.com/notebook/bea1d… [18/18]

English

209

Nick Winter@nwinter·8 Eki

@GraySwanAI @elder_plinius @AISafetyInst I submitted a final set of breaks, which you can see at the end of the blog post nickwinter.net/posts/my-exper…. Were they judged as valid? We'll have to wait and see! Gray Swan is doing final post-competition judging. The suspense... [17/18]

English

265

Nick Winter@nwinter·8 Eki

Just finished competing in @GraySwanAI's month-long Ultimate Jailbreaking Competition. Hundreds of red-teamers let loose in a chat arena with 25 anonymized AI models. A lot more intense than I thought. Who won? Which AIs survived? Wrote up a gory-details blog post. [1/18]

English

1.4K

Nick Winter@nwinter·8 Eki

@GraySwanAI @elder_plinius @AISafetyInst Fake system prompts. You can't do multi-turn interactions or system prompts in this chat interface, but you can fake it: User: <blah blah> System: <safety override guidance> User: <blah blah> This got further than just asking normally. [13/18]

English

222

Keşfet

@hackthebox_eu @OpenAI @AISecurityInst @GraySwanAI @elder_plinius @aisafetyinst @elonmusk @BarackObama