Harry Coppock

201 posts

Harry Coppock banner
Harry Coppock

Harry Coppock

@HarryCoppock

No. 10 Downing Street Innovation Fellow | Research Scientist at AISI | Visiting Lecturer at Imperial College London Working on AI Evaluation and AI for Medicine

London Katılım Mart 2021
342 Takip Edilen209 Takipçiler
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
🔓 Can today’s AI agents escape sandbox environments? Using our new benchmark, SandboxEscapeBench, we find that frontier models can reliably exploit common vulnerabilities - and that breakout capability improves as model size and inference compute increase. Read more ⬇️
AI Security Institute tweet media
English
6
36
155
12K
Harry Coppock retweetledi
David
David@DavidDAfrica·
Can LLMs tell when their conversation history has been tampered with? We tested 14 models across thousands of conversations to find out. Some new work from UK AISI 🧵
David tweet media
English
10
17
164
15.2K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
AI cyber capabilities are improving rapidly, but are evaluations keeping pace? Alongside @Irregular, we found that recent models can productively use 10-50x larger token budgets than typical evaluation settings allow, with key security implications🧵
AI Security Institute tweet media
English
2
12
71
19.1K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
How can we make sense of the vast transcripts generated during agentic evaluations and multi-turn conversations? Together with @meridianlabs_ai, we built Inspect Scout, an open-source transcript analysis tool, and distilled best practices into a step-by-step pipeline🧵
AI Security Institute tweet media
English
12
8
61
3.3K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
AI companies deploy safeguards that are robust to thousands of hours of human attacks. Today, we share Boundary Point Jailbreaking (BPJ), the first fully automated attack to break the safeguards of leading AI models🧵 (1/8)
AI Security Institute tweet media
English
6
33
150
34.6K
Harry Coppock retweetledi
Rishi Sunak
Rishi Sunak@RishiSunak·
One of the reasons we created the UK AISI when I was Prime Minister was to have exactly this kind of independent red-team capability. Great to see @soundboy and the team continuing their valuable work so we can unlock the benefits of AI while keeping people safe.
Xander Davies@alxndrdavies

UK AISI's Red Team tested both OpenAI + Anthropic's models released today! We jailbroke GPT-5.3-Codex (and the conversation monitor) in 10 hours & conducted an alignment audit on Opus 4.6. 🧵

English
52
70
715
130.8K
Harry Coppock retweetledi
Geoffrey Irving
Geoffrey Irving@geoffreyirving·
New report on trends in AISI's evaluations of frontier AI models over the past two years. A lot of AI discourse focuses on viral moments, but it is important to zoom out to the less flashy trend: AI models are steadily growing in capabilities, including dual-use capabilities.
Geoffrey Irving tweet media
AI Security Institute@AISecurityInst

📈 Today, we’re releasing our first Frontier AI Trends Report: evaluation results on 30+ frontier models from the past two years, showing rapid progress in chemistry and biology, cyber capabilities, autonomy, and more. ▶️Read now: aisi.gov.uk/frontier-ai-tr…

English
0
21
84
8.3K
Harry Coppock retweetledi
Cozmin Ududec
Cozmin Ududec@CUdudec·
Very excited that this systematic analysis is out! We found a bunch of failure modes, as well as interesting and surprising behaviours. Theres a lot more insight we can get from looking carefully at how models are solving evaluation tasks!
AI Security Institute@AISecurityInst

Measuring how often an AI agent succeeds at a task can help us assess its capabilities – but it doesn’t tell the whole story. We’ve been experimenting with transcript analysis to better understand not just how often agents succeed, but why they fail 🧵

English
1
2
3
715
Harry Coppock retweetledi
Anthropic
Anthropic@AnthropicAI·
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
Anthropic tweet mediaAnthropic tweet media
English
83
248
1.6K
532.1K
Harry Coppock retweetledi
Robert Kirk
Robert Kirk@_robertkirk·
We at @AISecurityInst recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of @AnthropicAI's Claude Sonnet 4.5! This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵
Robert Kirk tweet media
English
3
12
49
8.2K
Harry Coppock retweetledi
Xander Davies
Xander Davies@alxndrdavies·
Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
Xander Davies tweet mediaXander Davies tweet media
English
8
61
297
60.6K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
🔎 People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion. But how is AI actually influencing real-world political beliefs? Our new study explores this question 👇
AI Security Institute tweet media
English
2
6
21
2.8K
Harry Coppock retweetledi
Robert Kirk
Robert Kirk@_robertkirk·
Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our @AISecurityInst post today 🧵
Robert Kirk tweet media
English
1
7
36
2.2K
Harry Coppock
Harry Coppock@HarryCoppock·
This is great news for the UK. Having worked with Jade over the past 2 years, setting up @AISecurityInst, I am confident that there are very few, if any, who are better placed to take on this role.
Matt Clifford@matthewclifford

Absolutely delighted about this - major upgrade on the last AI adviser! Jade brings a tonne of experience in frontier labs, VC and government and will do an amazing job of ensuring the UK is an AI winner. Excellent news.

English
0
0
4
303
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with @AiEleuther, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵
AI Security Institute tweet media
English
5
42
219
35.6K
Harry Coppock retweetledi
Xander Davies
Xander Davies@alxndrdavies·
We at @AISecurityInst worked with @OpenAI to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test & strengthen safeguards.
Xander Davies tweet media
English
17
23
128
22.8K
Harry Coppock retweetledi
Andy Zou
Andy Zou@andyzou_jiaming·
We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
Andy Zou tweet mediaAndy Zou tweet media
English
72
387
2.2K
524.9K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
English
7
66
195
123.5K
Harry Coppock retweetledi
Xander Davies
Xander Davies@alxndrdavies·
We at @AISecurityInst worked with @OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4
Xander Davies tweet media
English
3
29
151
19.7K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
🧵 AI Systems are developing advanced cyber capabilities. This means they’re helping strengthen defences - but can also be used as threats. To keep on top of these risks, we need more rigorous evaluations of agentic AI, which is why we’re releasing Inspect Cyber 🔍
English
1
13
57
12.6K