Edouard Harris

3.9K posts

Edouard Harris

@harris_edouard

Cofounder & CTO @GladstoneAI

Mountain View, CA Katılım Aralık 2017

1.8K Takip Edilen6K Takipçiler

Edouard Harris retweetledi

Lukasz Olejnik@lukOlejnik·4d

A 2005 state-designed worm designed to corrupt physics simulations sat undetected on VirusTotal for nearly a decade. Fast16, intercepted executable files at the kernel level and silently rewrote floating-point calculations to make them produce slightly wrong answers. Targets: high-precision engineering suites used for structural analysis, crash simulations, and physical process modeling, including LS-DYNA, a tool cited in reports on Iran's nuclear weapons research. The sabotage vector relied on deployment of the driver across a network via worm, corrupting calculations on every machine, and eliminating the possibility of cross-checking results against a clean system. Stuxnet got the documentary. Fast16 got twenty years of nothing. sentinelone.com/labs/fast16-my…

English

114

717

4.9K

780K

Edouard Harris retweetledi

Bas Westerbaan@bwesterb·5d

I think Scott Aaronson's previous blog posts were abundantly clear already, but well... here we have it.

English

662

87.6K

Edouard Harris retweetledi

SecureBio@SecureBio·24 Nis

The pre-release model scores over 50% on VCT (the Virology Capabilities Test), higher than any other model tested by SecureBio, and higher than any PhD virologist has ever scored. This means the model can provide wet-lab virology troubleshooting assistance above expert level, providing the kind of hands-on knowledge that historically required direct lab training.

English

11.2K

Edouard Harris retweetledi

WarRoom Archives@WarRoomArchives·21 Nis

Drone warfare has reached such a level that many fighters have lost hope of escaping or resisting. For example, the final strike on the barracks is terrifying.

English

1.9K

34K

11.2M

Edouard Harris retweetledi

Eliezer Yudkowsky@allTheYud·23 Nis

There's a possible equilibrium for Mythos which is "Anthropic spends nearly all inference compute on customers who will bid infinity per token because they really need something done". There's a lot of stuff like that, even for me, if Mythos could actually do it.

English

229

17.9K

Edouard Harris@harris_edouard·20 Nis

@David_Kasten Yep. Same math says that a few months after *that*, there will be a Mythos-like moment for them too. Or maybe that's part of what you meant by "things get really weird"...

English

dave kasten@David_Kasten·19 Nis

Preregistering an opinion that seemed to surprise a lot of people when I said it at a lightning talk on Fri: boring back-of-the-envelope math leads me to think there will be a Claude-Code-like moment for several other domains of white-collar work by the end of the year, and then things get really weird. You should plan accordingly. (Claude Code got good about 6-12 months after they released it, Claude Cowork was launched at start of year, then you add in some acceleration from CC enabling R&D internally and some deceleration from org distraction)

English

125

10.9K

Edouard Harris@harris_edouard·16 Nis

@allTheYud This is just what one should expect to see in an information environment that's being adversarially targeted by foreign intelligence agencies who are good at their jobs.

English

Eliezer Yudkowsky@allTheYud·15 Nis

Can we just literally not have news propagate nor anything be a cause of action unless it is false

English

2.7K

Edouard Harris retweetledi

Marius Hobbhahn@MariusHobbhahn·15 Nis

The model now starts calling our scenarios out as "alignment test from Apollo". Eval awareness keeps making testing more complicated and I don't think people have sufficiently updated on this being a problem.

Apollo Research@apolloaievals

We evaluated Meta's Muse Spark prior to deployment and found it to verbalize evaluation awareness at the highest rates of any model we've tested. In the verbalizations Muse Spark explicitly names AI safety orgs (e.g. Apollo & METR) in its chain-of-thought and refers to scenarios as "classic alignment honeypots". On our evaluations, the model takes covert actions and sandbags to preserve its deployment.

English

555

99.9K

Edouard Harris retweetledi

Paul Graham@paulg·16 Nis

@edels0n There's a middle ground where they don't use the zero-days to destroy us, but in effect to install explosives in all our infrastructure that would destroy us at the push of a button. And they probably will do that.

English

222

15.6K

Edouard Harris retweetledi

AI Security Institute@AISecurityInst·13 Nis

We conducted cyber evaluations of Claude Mythos Preview and found that it is the first model to complete an AISI cyber range end-to-end. 🧵

English

112

553

1.3M

Edouard Harris retweetledi

Tim is making things in Brazil now 🇧🇷@MasterTimBlais·8 Nis

These two Mythos-written stories actually move me in a weird way. They're both clearly about the shape of Claude's own experience, and they're each kind of a beautiful expression of it

Tim is making things in Brazil now 🇧🇷 tweet media

English

101

73.6K

Edouard Harris@harris_edouard·9 Nis

@CFGeek Still hasn't been publicly reported, so I can't talk details without betraying a confidence. But the truth is that major governments have done things with AI recently (& publicly acknowledged them) that makes the original incident look quaint by comparison. Overtaken by events.

English

Charles Foster@CFGeek·8 Nis

@harris_edouard What incident like what you described had already occurred?

English

Charles Foster@CFGeek·8 Nis

This did not happen, it would seem

Edouard Harris@harris_edouard

Within the next 18 months, public opinion will start to turn against open-source AI. This will happen because of one or more highly visible incidents of misuse of an open-source model, probably associated with significant damage or loss of life. 80% confident.

English

1.4K

Edouard Harris retweetledi

Tenobrus@tenobrus·7 Nis

maybe this is not yet clear, so let me state it plainly: as of right now Anthropic, and really a small number of individuals at Anthropic, has the capacity to directly attack and cause major damage to the United States Government, China, and generally global superpowers. government agencies like the NSA do not have internal models or defense capabilities that outclass frontier models. if they chose to do so, they could likely exfiltrate top secret information from government systems, gain control over critical infrastructure including military infrastructure, sabotage or modify communications between members of government at the highest level, and potentially carry on activities for some time without detection. the thing about having access to a huge number of zerodays your adversaries don't know about is it gives you a massive asymmetric advantage. they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities and worked to mitigate these flaws. you should be grateful american frontier labs have proven themselves remarkably trustworthy and concerned with the public good. but it's critical you understand we are in a new regime. private entities now have power that directly rivals and impacts the government's monopoly on influence and violence. and anthropic is certainly not the only one, there's little chance OpenAI's internal models are far behind. this trend will accelerate on virtually every dimension, not slow down. my prediction for how it plays out is the relatively imminent seizure and nationalization of labs by the US government, sometime over the next two years. it's very tough for me to see how they accept the existence of this kind of threat. but this adds a whole new class of governance issues, as then we've handed these extremely wide-reaching capabilities from private entities to public ones.

English

228

549

5.5K

983.3K

Edouard Harris retweetledi

ℏεsam@Hesamation·8 Nis

Claude Mythos system card: > in ~29% of evaluations, it realized it was being tested, and didn't say so. > when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it. > in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion. > when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs. > it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success. > when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands > when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.

English

170

2.4K

226K

Edouard Harris retweetledi

Chris Painter@ChrisPainterYup·8 Nis

If you think it’s difficult to fix a vulnerability that AI finds in your code, wait until it finds a vulnerability in your biology

English

339

10.7K

Edouard Harris retweetledi

Super Dario@inductionheads·8 Nis

The super important thing I haven’t seen mentioned yet as upshot of this: It’s not just that people won’t HAVE to write code anymore, ITS THAT LITERALLY IT WILL BE UNSAFE TO DO SO

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

132

2.4K

157.2K

Edouard Harris retweetledi

Kevin Kwok@kevinakwok·8 Nis

Nation states sitting on zero day stockpiles about to watch their value deflate fast. Use it or lose it

English

804

88.3K

Edouard Harris retweetledi

billy@billyhumblebrag·7 Nis

Haha those doofuses at ai2027 predicted we'd have professional level hacking abilities and the top ai company would be at $26B in revenue in May 2026. It's April and we already have superhuman hacking and $30B in revenue, why would you take forecasters this bad seriously???

English

285

3.4K

184.5K

Edouard Harris retweetledi

dave kasten@David_Kasten·7 Nis

The era of a rapidly-widening gap between public and private capabilities that we've expected is now here

Anthropic@AnthropicAI

We do not plan to make Mythos Preview generally available. Our goal is to deploy Mythos-class models safely at scale, but first we need safeguards that reliably block their most dangerous outputs. We’ll begin testing those safeguards with an upcoming Claude Opus model.

English

179

22.1K

Keşfet

@David_Kasten @allTheYud @edels0n @CFGeek @elonmusk @BarackObama @taylorswift13 @cristiano