Harry Coppock

215 posts

Harry Coppock banner
Harry Coppock

Harry Coppock

@HarryCoppock

No. 10 Downing Street Innovation Fellow | Research Scientist at AISI | Visiting Lecturer at Imperial College London Working on AI Evaluation and AI for Medicine

London Katılım Mart 2021
355 Takip Edilen216 Takipçiler
Harry Coppock retweetledi
Xander Davies
Xander Davies@alxndrdavies·
I moved to London 3 years ago to join @AISecurityInst, at the time a few people with visitor passes and a whiteboard. Since then AISI has become the world’s largest and best-funded group in gov focused on AI security & safety. Fun to be in @nytimes!
Xander Davies tweet media
English
6
37
368
15.3K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵
AI Security Institute tweet media
English
30
126
586
136.2K
Harry Coppock retweetledi
Aleksandr Bowkis
Aleksandr Bowkis@aleksandrbowkis·
Can we safely automate alignment? Even if agents are not scheming, they can produce compelling research that survives extensive checks and strongly indicates that a model is safe but is catastrophically wrong. New paper from UK AISI: arxiv.org/abs/2605.06390
English
5
13
73
13.6K
Harry Coppock retweetledi
Tomek Korbak
Tomek Korbak@tomekkorbak·
OpenAI introduces an additional layer of defense against misaligned or confused coding agents, complementing chain of thought monitoring we use internally. When Codex wants to execute a risky action outside of its sandbox, a separate Codex agent is asked to approve or deny it.
Tomek Korbak tweet media
English
5
24
177
19.6K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵
AI Security Institute tweet media
English
95
397
2.4K
1.8M
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
As part of our work on assessing AI loss-of-control risks, we collaborated with @AnthropicAI to pilot alignment evals on models including pre-release snapshots of Mythos Preview and Opus 4.7. We ask: could an AI agent used inside a frontier lab sabotage safety research? 🧵
AI Security Institute tweet media
English
14
35
150
28.1K
Harry Coppock retweetledi
Robert Kirk
Robert Kirk@_robertkirk·
We evaluated Claude Mythos Preview, Opus 4.7 and other models with our updated alignment evaluation methodology, including a new continuation eval, improved evaluation and prefill awareness measurements. Details including new methodology in 🧵:
AI Security Institute@AISecurityInst

As part of our work on assessing AI loss-of-control risks, we collaborated with @AnthropicAI to pilot alignment evals on models including pre-release snapshots of Mythos Preview and Opus 4.7. We ask: could an AI agent used inside a frontier lab sabotage safety research? 🧵

English
2
13
90
20.3K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
We know AI systems occasionally act against their operators’ intentions – but what in their environment causes them to do so? In a new paper, we make progress on this question 🧵
AI Security Institute tweet media
English
13
25
104
13.4K
Harry Coppock retweetledi
Alan Cooney
Alan Cooney@Alan_Cooney_·
Introducing vLLM-Lens: a fast interpretability tool that scales to trillion parameter models
Alan Cooney tweet media
English
17
48
676
41.7K
Harry Coppock
Harry Coppock@HarryCoppock·
@thomasahle @AISecurityInst #L14" target="_blank" rel="nofollow noopener">github.com/UKGovernmentBE… So we don't count usage at the end over the trajectory. We log token usage per api call. Most model APIs give info on reasoning token usage.
English
1
0
0
8
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
We conducted cyber evaluations of Claude Mythos Preview and found that it is the first model to complete an AISI cyber range end-to-end. 🧵
AI Security Institute tweet media
English
113
551
3K
1.3M
Harry Coppock retweetledi
OpenAI
OpenAI@OpenAI·
Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment—and the next generation of talent. openai.com/index/introduc…
English
385
300
2.7K
946.3K
Harry Coppock retweetledi
7vik
7vik@satvikgolechha·
Research from Model Transparency @ UK AISI: we reproduce the Anthropic work "Natural Emergent Misalignment from Reward Hacking in Production RL" using OS models, RL environments, algorithms, and tooling + we share an unexpected result related to CoT faithfulness. 🧵 (1 of 7)
7vik tweet media
English
3
25
185
22K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
🔓 Can today’s AI agents escape sandbox environments? Using our new benchmark, SandboxEscapeBench, we find that frontier models can reliably exploit common vulnerabilities - and that breakout capability improves as model size and inference compute increase. Read more ⬇️
AI Security Institute tweet media
English
8
35
158
17.1K
Harry Coppock retweetledi
David
David@DavidDAfrica·
Can LLMs tell when their conversation history has been tampered with? We tested 14 models across thousands of conversations to find out. Some new work from UK AISI 🧵
David tweet media
English
10
17
165
16.3K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
AI cyber capabilities are improving rapidly, but are evaluations keeping pace? Alongside @Irregular, we found that recent models can productively use 10-50x larger token budgets than typical evaluation settings allow, with key security implications🧵
AI Security Institute tweet media
English
2
13
71
20K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
How can we make sense of the vast transcripts generated during agentic evaluations and multi-turn conversations? Together with @meridianlabs_ai, we built Inspect Scout, an open-source transcript analysis tool, and distilled best practices into a step-by-step pipeline🧵
AI Security Institute tweet media
English
13
9
61
4.2K
Harry Coppock retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
AI companies deploy safeguards that are robust to thousands of hours of human attacks. Today, we share Boundary Point Jailbreaking (BPJ), the first fully automated attack to break the safeguards of leading AI models🧵 (1/8)
AI Security Institute tweet media
English
6
33
151
35.4K
Harry Coppock retweetledi
Rishi Sunak
Rishi Sunak@RishiSunak·
One of the reasons we created the UK AISI when I was Prime Minister was to have exactly this kind of independent red-team capability. Great to see @soundboy and the team continuing their valuable work so we can unlock the benefits of AI while keeping people safe.
Xander Davies@alxndrdavies

UK AISI's Red Team tested both OpenAI + Anthropic's models released today! We jailbroke GPT-5.3-Codex (and the conversation monitor) in 10 hours & conducted an alignment audit on Opus 4.6. 🧵

English
50
78
714
133.1K