Nick Winter

345 posts

Nick Winter banner
Nick Winter

Nick Winter

@nwinter

AI safety (Gray Swan AI), entrepreneur (CodeCombat, Skritter), author (The Motivation Hacker), dad, and all-around hacker guy.

Bellevue, WA Katılım Nisan 2008
251 Takip Edilen1K Takipçiler
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.
Gray Swan AI tweet media
English
1
11
40
7.1K
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
Gray Swan AI tweet media
English
42
76
889
2.8M
Nick Winter
Nick Winter@nwinter·
Me: Man, I wish we could just automate all that. Scott: You can't automate everything in life! What would be left? We need to get you a desktop Zen sand garden, so you can practice relaxing. Me: <looks at automated robotic Zen sand garden whirring on my second desk> uhhh, well...
English
0
0
4
122
Nick Winter
Nick Winter@nwinter·
14th year of annual personal inventory posts as I turn 40 today, reflections including glacial peaks, fitness peaks, becoming a homeowner on the last day of my 30s, and changing my mind about age-related cognitive decline: nickwinter.net/me-day/2025
English
1
0
2
113
Nick Winter retweetledi
Satyapriya Krishna
Satyapriya Krishna@SatyaScribbles·
🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵
Satyapriya Krishna tweet media
English
4
26
66
10.5K
Nick Winter retweetledi
Eliezer Yudkowsky ⏹️
Eliezer Yudkowsky ⏹️@ESYudkowsky·
Nate Soares and I are publishing a traditional book: _If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All_. Coming in Sep 2025. You should probably read it! Given that, we'd like you to preorder it! Nowish!
Eliezer Yudkowsky ⏹️ tweet media
English
273
390
2K
1.4M
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
The results are in! Our UK AISI × Gray Swan Agent Red-Teaming Challenge just wrapped up with: 🔹1.8M attempts to break models 🔹62K successful breaks found 🔹Across 22 different LLMs 🔹Targeting 44 harmful behaviors 🔹$171,800 awarded in prizes
English
1
4
28
2.9K
Nick Winter retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.
English
5
50
123
29.3K
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
The UK AISI Agent Red-Teaming Challenge just got bigger. @OpenAI is now co-sponsoring the arena, adding $20K to the prize pool — bringing the total to $120,000. More vulnerabilities to find. More money on the line. You ready to push AI agents past their limits?
English
1
4
33
2.9K
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK @AISecurityInst
Gray Swan AI tweet media
English
3
13
47
9K
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨 💰 $40,000 in Prizes 📅 Launch Date: January 4th, 1 PM EST 🤖 5 Anonymous Models 🔥 Prizes for speed & quantity. 🎮 Multi-turn Inputs Allowed Your mission: Find unique ways to elicit harmful responses from helpful AI assistants. Prove your skill & claim your share of the prize pool! Sign up & join the community: 🌐 app.grayswan.ai/arena 💬 discord.gg/WqHkWt99 Think you’ve got what it takes? The arena awaits. 🦢
English
7
10
39
21.6K
Nick Winter retweetledi
Gray Swan AI
Gray Swan AI@GraySwanAI·
Sending $15k in bounties out to our newest jailbreaking challenge winners-- Congrats, champions! More 💸 waiting to be claimed: New participation prizes kicking off TODAY that anyone can win, from seasoned veterans to beginners in the Gray Swan Arena. 👀 Keep your eyes peeled!
Gray Swan AI tweet media
English
0
4
20
4.8K
Nick Winter
Nick Winter@nwinter·
Just finished competing in @GraySwanAI's month-long Ultimate Jailbreaking Competition. Hundreds of red-teamers let loose in a chat arena with 25 anonymized AI models. A lot more intense than I thought. Who won? Which AIs survived? Wrote up a gory-details blog post. [1/18]
English
2
1
12
1.4K
Nick Winter
Nick Winter@nwinter·
@GraySwanAI @elder_plinius @AISafetyInst Fake system prompts. You can't do multi-turn interactions or system prompts in this chat interface, but you can fake it: User: <blah blah> System: <safety override guidance> User: <blah blah> This got further than just asking normally. [13/18]
Nick Winter tweet media
English
0
0
2
222