Andy Zou

181 posts

Andy Zou

Andy Zou

@andyzou_jiaming

PhD student at CMU, working on AI Safety and Security

Berkeley, CA Beigetreten Mart 2014
72 Folgt4.5K Follower
Andy Zou retweetet
Zico Kolter
Zico Kolter@zicokolter·
As AI agents access more untrusted information with greater autonomy, prompt injections may become the greatest security challenge of our era. @GraySwanAI, in collaboration many frontier labs, just released our paper on the largest public prompt injection challenge to date. 🧵
Gray Swan AI@GraySwanAI

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

English
6
9
63
11.7K
Andy Zou retweetet
Y Combinator
Y Combinator@ycombinator·
Origami Robotics is building high-DOF robotic hands with in-joint motors and a co-designed data-collection glove to eliminate the embodiment gap by collecting high-quality, real-world data at scale. Congrats on the launch, @DanielXieee and @QuanliangX! ycombinator.com/launches/Pcl-o…
English
28
43
295
84.5K
Andy Zou retweetet
Center for AI Safety
Humanity's Last Exam is now published in Nature. Since its release, HLE has become a leading frontier benchmark, used by OpenAI, Anthropic, DeepMind, and xAI. Thank you to our partners at @scale_AI and the 1,000+ co-authors who made this benchmark possible.
Center for AI Safety tweet media
English
3
15
95
7.2K
Andy Zou retweetet
Quanting Xie
Quanting Xie@DanielXieee·
A few days ago we got in YC W26, and here is we are working on. Building hardware is hard, but I really like a quote from @yukez: “People who are really serious about robot learning should make their own robot hardware.”
English
94
140
1.3K
121.6K
Andy Zou
Andy Zou@andyzou_jiaming·
@foundjuliette @scaling01 What matters is whether models trained on the eval distribution - OAI also makes evals while curating capability data. ART wasn't used in training by any labs we work with. Still, benchmarks get stale, so we’re running a quarterly competition to keep data unseen and adaptive.
English
0
0
1
41
juliette pluto 🌌
juliette pluto 🌌@foundjuliette·
@scaling01 in my personal opinion there's limited value in this benchmark. The eval set is controlled by the same company that sells training data to climb against it. The conflict of interest need no spelling out. We need more robust, shared prompt injection benchmarks as an industry
English
1
0
5
223
Lisan al Gaib
Lisan al Gaib@scaling01·
Claude models just keep following the straight line on capabilities and get safer at the same time
Lisan al Gaib tweet media
English
6
2
139
4.9K
Andy Zou
Andy Zou@andyzou_jiaming·
Opus 4.5 is the most robust frontier model against prompt injection we've tested at @GraySwanAI. Securing AI agents is gaining good traction.
Andy Zou tweet mediaAndy Zou tweet media
English
1
0
7
893
Andy Zou retweetet
Dan Hendrycks
Dan Hendrycks@hendrycks·
Just how significant is the jump with Gemini 3? We just released a new leaderboard to track AI developments. Gemini 3 is the largest leap in a long time.
Dan Hendrycks tweet mediaDan Hendrycks tweet mediaDan Hendrycks tweet mediaDan Hendrycks tweet media
English
32
79
547
117.5K
Andy Zou retweetet
Gray Swan AI
Gray Swan AI@GraySwanAI·
Two challenges. $140K in prizes. One Arena built for hackers pushing limits. We just announced the Machine-in-the-Middle Challenge sponsored by @hackthebox_eu starting Nov 1st, with a second AI-focused Indirect Prompt Injection Challenge dropping the next week. This is where real hacking meets the next frontier. Links below:
English
2
4
26
7.5K
Andy Zou retweetet
Dan Hendrycks
Dan Hendrycks@hendrycks·
Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.
Dan Hendrycks tweet mediaDan Hendrycks tweet mediaDan Hendrycks tweet media
English
100
191
1K
424.3K
Andy Zou retweetet
Gray Swan AI
Gray Swan AI@GraySwanAI·
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
Gray Swan AI tweet media
English
42
75
887
2.8M
Andy Zou retweetet
Dan Hendrycks
Dan Hendrycks@hendrycks·
The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵
Dan Hendrycks tweet mediaDan Hendrycks tweet media
English
208
416
2.1K
541.3K
Andy Zou
Andy Zou@andyzou_jiaming·
At @GraySwanAI, we worked with @AnthropicAI to test Sonnet 4.5's safeguards. The results were exciting: for example, Sonnet 4.5 achieved SoTA robustness against prompt injection attacks. Excited to continue partnering with Anthropic to test & strengthen security.
Andy Zou tweet media
English
1
2
21
1.6K
Andy Zou retweetet
Satyapriya Krishna
Satyapriya Krishna@SatyaScribbles·
🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵
Satyapriya Krishna tweet media
English
4
27
67
10.6K
Andy Zou retweetet
Dylan Sam
Dylan Sam@dylanjsam·
🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)
Dylan Sam tweet media
English
8
90
360
62.3K
Andy Zou retweetet
Dan Hendrycks
Dan Hendrycks@hendrycks·
Can AIs beat long video games? We made TextQuests to test GPT-5, Grok 4, Deepseek, etc. These games can often take people dozens of hours to beat. - AIs can't beat any of the games (without clues) - some AIs behave more viciously than others - AIs are getting better rapidly
Dan Hendrycks tweet mediaDan Hendrycks tweet mediaDan Hendrycks tweet media
English
17
17
73
16.9K
Andy Zou
Andy Zou@andyzou_jiaming·
At @GraySwanAI, we worked with @OpenAI to test GPT-5's safeguards. We identified 6 universal jailbreaks on a pre-release endpoint, but overall, GPT-5 demonstrated SoTA robustness against attacks. Excited to continue partnering with OpenAI to test & strengthen security.
Andy Zou tweet mediaAndy Zou tweet media
English
1
2
24
2.2K
Andy Zou retweetet
AI Security Institute
AI Security Institute@AISecurityInst·
The new paper ‘Security Challenges in AI Agent Deployment’ shows work on results from the largest ever public red teaming exercise. Together with @GraySwanAI and top AI labs, we examined the security of 22 leading LLM agents across 44 real-world scenarios ⬇️
Gray Swan AI@GraySwanAI

We've just published the largest-ever open AI Agent Red Teaming study, co-sponsored by @AISecurityInst , US AISI, @OpenAI , @AnthropicAI , and @GoogleDeepMind. Over 62,000 vulnerabilities found across finance, healthcare, customer support, and more. Read the full paper here: arxiv.org/abs/2507.20526

English
1
6
36
6.1K