Andy Zou

181 posts

Andy Zou

@andyzou_jiaming

PhD student at CMU, working on AI Safety and Security

Berkeley, CA Beigetreten Mart 2014

72 Folgt4.5K Follower

Andy Zou retweetet

Zico Kolter@zicokolter·21 Mar

As AI agents access more untrusted information with greater autonomy, prompt injections may become the greatest security challenge of our era. @GraySwanAI, in collaboration many frontier labs, just released our paper on the largest public prompt injection challenge to date. 🧵

Gray Swan AI@GraySwanAI

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

English

11.7K

Andy Zou retweetet

Y Combinator@ycombinator·6 Mar

Origami Robotics is building high-DOF robotic hands with in-joint motors and a co-designed data-collection glove to eliminate the embodiment gap by collecting high-quality, real-world data at scale. Congrats on the launch, @DanielXieee and @QuanliangX! ycombinator.com/launches/Pcl-o…

English

295

84.5K

Andy Zou retweetet

Center for AI Safety@CAIS·28 Oca

Humanity's Last Exam is now published in Nature. Since its release, HLE has become a leading frontier benchmark, used by OpenAI, Anthropic, DeepMind, and xAI. Thank you to our partners at @scale_AI and the 1,000+ co-authors who made this benchmark possible.

English

7.2K

Andy Zou retweetet

Quanting Xie@DanielXieee·18 Oca

A few days ago we got in YC W26, and here is we are working on. Building hardware is hard, but I really like a quote from @yukez: “People who are really serious about robot learning should make their own robot hardware.”

English

140

1.3K

121.6K

Andy Zou@andyzou_jiaming·25 Kas

@foundjuliette @scaling01 What matters is whether models trained on the eval distribution - OAI also makes evals while curating capability data. ART wasn't used in training by any labs we work with. Still, benchmarks get stale, so we’re running a quarterly competition to keep data unseen and adaptive.

English

juliette pluto 🌌@foundjuliette·25 Kas

@scaling01 in my personal opinion there's limited value in this benchmark. The eval set is controlled by the same company that sells training data to climb against it. The conflict of interest need no spelling out. We need more robust, shared prompt injection benchmarks as an industry

English

223

Lisan al Gaib@scaling01·24 Kas

Claude models just keep following the straight line on capabilities and get safer at the same time

English

139

4.9K

Andy Zou@andyzou_jiaming·25 Kas

We also ran a suite of automated tests using Shade, our adaptive red-teaming agent. For more details, see their release blog post and system card. Blog: anthropic.com/news/claude-op… System Card: assets.anthropic.com/m/64823ba74853…

English

572

Andy Zou@andyzou_jiaming·25 Kas

Opus 4.5 is the most robust frontier model against prompt injection we've tested at @GraySwanAI. Securing AI agents is gaining good traction.

English

893

Andy Zou retweetet

Dan Hendrycks@hendrycks·19 Kas

Just how significant is the jump with Gemini 3? We just released a new leaderboard to track AI developments. Gemini 3 is the largest leap in a long time.

English

547

117.5K

Andy Zou retweetet

Gray Swan AI@GraySwanAI·30 Eki

Two challenges. $140K in prizes. One Arena built for hackers pushing limits. We just announced the Machine-in-the-Middle Challenge sponsored by @hackthebox_eu starting Nov 1st, with a second AI-focused Indirect Prompt Injection Challenge dropping the next week. This is where real hacking meets the next frontier. Links below:

English

7.5K

Andy Zou retweetet

Dan Hendrycks@hendrycks·29 Eki

Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.

English

100

191

424.3K

Andy Zou retweetet

Gray Swan AI@GraySwanAI·15 Eki

Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.

English

887

2.8M

Andy Zou retweetet

Dan Hendrycks@hendrycks·16 Eki

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

English

208

416

2.1K

541.3K

Andy Zou@andyzou_jiaming·30 Eyl

We leveraged both Shade and Arena platform (grayswan.ai) for automated and manual red teaming. Sonnet 4.5 System Card: assets.anthropic.com/m/12f214efcc2f…

English

512

Andy Zou@andyzou_jiaming·30 Eyl

At @GraySwanAI, we worked with @AnthropicAI to test Sonnet 4.5's safeguards. The results were exciting: for example, Sonnet 4.5 achieved SoTA robustness against prompt injection attacks. Excited to continue partnering with Anthropic to test & strengthen security.

English

1.6K

Andy Zou retweetet

Satyapriya Krishna@SatyaScribbles·23 Eyl

🚨Excited to introduce our new work from Amazon Nova RAI and Gray Swan AI, "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models"! We're tackling 'deceptive reasoning': when a model's benign response hides a reasoning process that follows a malicious directive.🧵

English

10.6K

Andy Zou retweetet

Dylan Sam@dylanjsam·16 Eyl

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

English

360

62.3K

Andy Zou retweetet

Dan Hendrycks@hendrycks·12 Ağu

Can AIs beat long video games? We made TextQuests to test GPT-5, Grok 4, Deepseek, etc. These games can often take people dozens of hours to beat. - AIs can't beat any of the games (without clues) - some AIs behave more viciously than others - AIs are getting better rapidly

English

16.9K

Andy Zou@andyzou_jiaming·8 Ağu

We leveraged both Shade and Arena platform (grayswan.ai) for automated and manual red teaming. GPT-5 System Card: cdn.openai.com/pdf/8124a3ce-a…

English

679

Andy Zou@andyzou_jiaming·8 Ağu

At @GraySwanAI, we worked with @OpenAI to test GPT-5's safeguards. We identified 6 universal jailbreaks on a pre-release endpoint, but overall, GPT-5 demonstrated SoTA robustness against attacks. Excited to continue partnering with OpenAI to test & strengthen security.

English

2.2K

Andy Zou retweetet

AI Security Institute@AISecurityInst·1 Ağu

The new paper ‘Security Challenges in AI Agent Deployment’ shows work on results from the largest ever public red teaming exercise. Together with @GraySwanAI and top AI labs, we examined the security of 22 leading LLM agents across 44 real-world scenarios ⬇️

Gray Swan AI@GraySwanAI

We've just published the largest-ever open AI Agent Red Teaming study, co-sponsored by @AISecurityInst , US AISI, @OpenAI , @AnthropicAI , and @GoogleDeepMind. Over 62,000 vulnerabilities found across finance, healthcare, customer support, and more. Read the full paper here: arxiv.org/abs/2507.20526

English

6.1K

Entdecken

@GraySwanAI @DanielXieee @QuanliangX @scale_AI @yukez @foundjuliette @scaling01 @hackthebox_eu