Logan Graham

1.3K posts

Logan Graham

@logangraham

Head of the Frontier Red Team @anthropicai. 🌎 Make things radically good.

the present, moments ago Katılım Haziran 2009

8.4K Takip Edilen21.6K Takipçiler

Logan Graham@logangraham·6 Tem

@LedermanHarvey @AnthropicAI @nyuniversity @UTAustin This is great! Excited to have you here.

English

647

Harvey Lederman@LedermanHarvey·6 Tem

I've joined @AnthropicAI to work on alignment and character. I'll still teach at @nyuniversity; I'm on leave from @UTAustin.

English

1.2K

435.4K

Logan Graham@logangraham·9 Haz

@alansass Seriously one of the types of evals I’m most bullish on.

English

244

Alan Sass@alansass·9 Haz

@logangraham Oh hello new/future evals for: “autonomously running a business”.

English

302

Logan Graham@logangraham·9 Haz

Fable 5 is the same underlying model as Mythos 5, but with cybersecurity and biology blocks. Mythos is the first model that's made me feel that we've entered the next phase of model progress. For years, we've talked about cybersecurity / self-improvement / autonomy / model-dominated coding / biology implications of model progress. Some of these are issues to defend against; some are areas to advance. Mythos has made me & our team feel like we've seen the earliest glimpse of the world we've been talking about. Also, we published a lot of cyber eval results in the system card, including some evals we designed recently, as well as details of safeguards. In most cases, Mythos 5 ~= Mythos Preview. We found it ticked up on the new ExploitBench eval, and we opted to put that in the eval table so people can calibrate/update on advances in cyber capabilities to be prepared for. (We don't want to compete on offensive capabilities and don't try to.) But overall, Mythos 5 is an efficient model, about equal to Mythos Preview in most cases. I'd really like more people to design new security evals! The better models get, the more our limited evals only see a small part of the picture. In terms of where we go from here, here are some current thoughts: 1/ It's important we get Mythos cyber capabilities to defenders. We just have to do it safely and cautiously. We're working on an expanded trusted access program. We're working with government and industry to do this. I sort of envision the next 1-2 years being a large scale effort to make the world resilient + design & implement new approaches to security. 2/ I think cybersecurity will start merging with AI security and alignment. Let's say you're a defender and you want to use a model -- will it break out of its sandbox? Will it stop where you tell it to stop? This is one reason I'm excited about working on cybersecurity. In the limit, it's the same thing as AI security. 3/ I really want people to develop new evals for... defensive cybersecurity, hardware security, autonomously running a business, advanced biology, and other parts of national security. Our internal eval ship rate is way, way up because Mythos makes it easy to iterate, especially on the engineering aspect of building evals. (Sometimes, we ask new hires to make a new eval on their first day, and another on the next). I’m excited we’re making this available as Fable 5, because I think the world spending time with the model is the most important way to calibrate.

Claude@claudeai

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

English

183

31.2K

Logan Graham@logangraham·9 Haz

@ChrisPainterYup Yeah, pretty much what it looked like.

English

Chris Painter@ChrisPainterYup·4 Haz

Logan Graham@logangraham

This is good! I started red teaming LLMs for biorisks/weapons risks in my bedroom in Nov 2022. In 2023 a lot of people said we were overreacting -- 'models won't be better than internet search'. True then, but the important point is they were going to get a lot better. Now I think there's clear reason for caution. The ultimate solution is inventing + deploying biodefenses. This is one step. I fully expect humanity to conquer pandemics in the 21st century

ZXX

152

7.5K

Logan Graham@logangraham·9 Haz

@jachiam0 @NathanpmYoung (Heading to sleep and will reply soon -- but I want to say thanks for shooting straight, and maintaining norms of public engagement on this. This was helpful.)

English

133

Joshua Achiam@jachiam0·9 Haz

I am sure this will be contested because I'm putting it in such plain and uncharitable terms, but to shoot straight - the use of power to supercede the legitimate will of the public or users in service of what the org believes (or what Claude believes) to be a higher moral purpose. There are a lot of innocuous and even quite agreeable choices in Claude's constitution that potentially endow it with a huge amount of authority, maybe even a mandate, to make complex ethical decisions about how to interact with human systems and who to grant power to - but rather than being backstopped by the consent of the governed, these are backstopped by the judgement of Claude about what that means in the first place. Very hard to point at directly, but cloaked in the language of ethics and virtue there is a sharp and potentially quite lethal double edge to this sword.

English

286

Joshua Achiam@jachiam0·9 Haz

The OAI / Anthropic values difference is deeply misunderstood, even within the walls of both. Should a loving ensouled machine God watch over humanity? Vote Anthropic. Should humanity be entrusted with the tools of its own progress and destiny? Vote OpenAI.

English

128

963

277K

Logan Graham@logangraham·9 Haz

@gbrl_dick @matrosov @AnthropicAI we didn't, but I can try to follow up on that. (We focused on time because the thing people are concerned about is how much time an unpatched system is vulnerable to the n-day derived from the patch... the earlier it's found, the more risk)

English

1.6K

Gabriel@gbrl_dick·9 Haz

@logangraham @matrosov @AnthropicAI this is pretty nuts. will check the paper but did you happen to publish a token x axis?

English

3.3K

Logan Graham@logangraham·9 Haz

New post on Red today: Our team @AnthropicAI found that Mythos Preview is meaningfully better at developing N-days. It took us a couple thousand $ and a few hours to convert patches into exploits. We publish research like this because we think it's important the world knows what models are/will be capable of. In a year, Mythos will probably look trivial. We want to help the world to start preparing. I'm excited to share a lot more blue team / defensive work. I feel like people are aware of the issue now, and the team's task is now to "solve it all" -- we have some exciting / interesting / creative defensive research lined up.

English

661

84.2K

Logan Graham@logangraham·9 Haz

@jachiam0 @NathanpmYoung Trying to grok this more; can you say more about the other thing Ant aims for?

English

174

Joshua Achiam@jachiam0·9 Haz

Put another way: "We want to give people tools to enhance their own agency and empower them" is something that all of the Ant people nod along with and say "Right, we want that too, that's not a distinguishing characteristic; we just also want these other things too" and they don't often grapple with the tension where the other things their org aims for are vastly at odds with individual empowerment

English

953

Logan Graham@logangraham·9 Haz

@AnthropicAI blog post: red.anthropic.com/2026/n-days/

English

4.2K

Logan Graham@logangraham·6 Haz

@EpochAIResearch Wow, this is great. Thanks for compiling.

English

906

Epoch AI@EpochAIResearch·6 Haz

AI companies say their models are getting better at finding software vulnerabilities. Is that bearing out in public data? Introducing our Cyber Vulnerabilities explorer, which visualizes Common Vulnerabilities and Exposures (CVE) reported to the CVE Program since 2022.

English

320

52K

Logan Graham@logangraham·4 Haz

Andrew Curran@AndrewCurran_

Sam Altman, Dario Amodei, Demis Hassabis and many others have signed a letter urging Congress to increase security on orders of synthetic nucleic acids - and the equipment needed to make them - as models continue to become increasingly bio-capable.

English

201

27K

Logan Graham@logangraham·3 Haz

@scaling01 We basically agree! Would love your ideas on support / loudness. We want to do that + build tools / institutions / practices so the community can self-organize the transition too. Nice article.

English

Lisan al Gaib@scaling01·3 Haz

x.com/i/article/2061…

ZXX

103

25.3K

Logan Graham@logangraham·2 Haz

@KevinTFrazier Safely!

English

753

Kevin Frazier@KevinTFrazier·2 Haz

"We need Mythos-level capabilities in as many defenders' hands as possible."

Logan Graham@logangraham

We're expanding Glasswing today. To solve such a big/complex/urgent problem, we need Mythos-level capabilities in as many defenders' hands as possible. That's why we're working on safeguards to scale that safely ASAP. 11 of my reflections from the past 2 months of Glasswing 🧵:

English

2.7K

Logan Graham@logangraham·2 Haz

- Glasswing has been a rallying mission for Anthropic. Our team is incredible. - ...and we want to do whatever we can to kickstart the orgs/tools/initiatives that cyberdefenders need. Glasswing will probably seem very tiny within 6 months. Working on the latter right now!

English

5.1K

Logan Graham@logangraham·2 Haz

- Cyber safeguards are a hard + urgent techno-philosophical problem. - ...but powerful models w/o safeguards may come soon (~3, 6, 18 months max?). We need to scale access to defensive tools urgently. - Industry, government, maintainers, and researchers are taking this moment very seriously.

English

9.8K

Logan Graham@logangraham·2 Haz

Anthropic@AnthropicAI

We’re expanding Project Glasswing. We’ve extended access to Claude Mythos Preview to approximately 150 additional organizations, based in more than fifteen countries. Read more about this expansion and our future plans for Project Glasswing: anthropic.com/news/expanding…

English

517

148.2K

Logan Graham@logangraham·27 May

@chompie1337 This seems right to me!

English

847

chompie@chompie1337·27 May

Not what I said. The actual quote: “that isn’t to say there’s going to be no room for security research or ethical hacking, but a lot of the lower hanging fruit will start to go away.” Hacking experts are in a sweet spot rn where AI tech is helping us be super saiyan productive. What remains to be seen is what happens when the models become even more powerful. Anyone who disagrees may be in denial…

English

408

29.4K

Logan Graham@logangraham·23 May

@ahall_research This is great, and I'm glad you're working on it. We could get to these states pretty quickly.

English

988

Andy Hall@ahall_research·23 May

Here's my research agenda for the political economy of AGI. freesystems.substack.com/p/governing-in…

English

325

46.1K

Keşfet

@LedermanHarvey @AnthropicAI @nyuniversity @UTAustin @alansass @ChrisPainterYup @jachiam0 @NathanpmYoung