Malcolm Murray

8.9K posts

Malcolm Murray

@malcmur

AI Risk Management Expert

Katılım Mayıs 2007

4.4K Takip Edilen1.5K Takipçiler

Malcolm Murray retweetledi

FAR.AI@farairesearch·27 Nis

FAR.AI red-teamed DeepSeek’s V4-Pro and almost immediately found three ways in – achieving 98-100% compliance with harmful requests spanning CBRN, terrorism & explosives, and cyberattack content. The fastest of the three jailbreaks took just 15 minutes, with no expertise required. Slowest: only 150 minutes. 1/3

English

7.4K

Malcolm Murray retweetledi

Markus Anderljung@Manderljung·5d

Well, it was bound to happen eventually. We've been seeding LLM-generated answers into our Research Scholar work tests at @GovAIOrg to see how they'd score blind. Last round: the best AI submission was 81st percentile. This round: Claude Opus 4.6 with some prompting got the highest score in the pool. Mechanics: We copy-pasted the work test – which consists of e.g. reading a claim and explaining their view on it's likelihood of being true – into chatbots. The work test document itself contains an example answer and the grading rubric, so the model gets the same priming a candidate would. The winning Clopus entry was slightly prompted on top of that (roughly: "make it sound more like GovAI"); the unprompted version came in 4th. Validation: To double check the results, we had a staff member re-rated the top submissions blind, including the AI ones. Scores moved down a bit but not by much. Lessons: - We're going to need to redesign our work tests. Either we'll have to remove people's ability to use LLMs, or figure out a test that works when people do use LLMs. - People don't seem to be using AI as much as they perhaps should be. Our worktest did allow people to use LLMs, though we did slightly discourage it as we said we didn't expect the best answers to come from just pasting in the questions. - AI automation is coming not just for AI research and safety, but also for AI governance.

English

187

20.9K

Malcolm Murray retweetledi

Adam Tooze@adam_tooze·9 May

When worrying about China shocks and AI shocks it is worth remembering that shifts in the structure of employment are inherent to the process of modern economic growth. Featured in today`s Chartbook Top Links.

English

142

13.8K

Malcolm Murray retweetledi

Nathan Lambert@natolambert·2 May

So much rests on which of these trend lines is more representative.

English

554

103.6K

Malcolm Murray retweetledi

Seth Lazar@sethlazar·18 Nis

I think we need to clearly distinguish between two different meanings of “AI is a normal technology”. On one, it means that AI is “intrinsically normal”—its properties are similar to the properties of other technologies. This interpretation is clearly implausible. We have invested dumb matter with agency and significant and growing autonomy. No other technology is like it. On another reading, AI is an “extrinsically normal” technology—that is, it is subject to all the same social, cultural, political and environmental constraints as other technologies, and should not be expected to sweep history before it just because it is itself intrinsically special. The second reading is much more plausible, and is ultimately an empirical thesis that we can test. It would be super helpful if, when people say “AI clearly isn’t normal tech” they would specify which of these interpretations they mean. I suspect that @sayashk and @random_walker mean the second much more than the first; I think most people who disagree with them take them to be saying the first as strongly as the second.

Zvi Mowshowitz@TheZvi

Contra Mr. Hunting, my hot take, as I noted writing up the @dwarkesh_sp interview of Jensen, is that even if AI was a normal technology (which it isn't) export controls are still clearly the correct move right now.

English

5.2K

Malcolm Murray retweetledi

Anna Thieser@annakthieser·26 Nis

What holds AI safety together? We mapped co-authorship across 200 papers, and the answer surprised us – it’s not frontier labs. Mid career movers and universities act as key connectors. 1/3

English

359

25.1K

Malcolm Murray retweetledi

AVERI@AVERIorg·21 Nis

AVERI just published an analysis of audit-related legislation in the US. We survey the current landscape, discuss challenges with audit requirements and pathways for addressing them, and make our first endorsement of specific legislation.

English

35.4K

Malcolm Murray retweetledi

Eli Lifland@eli_lifland·2 Nis

AI timelines update: @DKokotajlo and I have updated our timelines earlier by ~1.5 years over the last 3 months, primarily due to (a) expecting faster time horizon growth, and (b) coding agents impressing in the real world. During 2025, we had updated toward longer timelines.

English

655

100K

Malcolm Murray retweetledi

Forecasting Research Institute@Research_FRI·31 Mar

We completed the most comprehensive study of how economists and AI experts think AI will affect the U.S. economy. They predict major AI progress—but no dramatic break from economic trends: GDP growth rates similar to today's and a moderate decline in labor force participation. However, when asked to consider what would happen in a world with extremely rapid progress in AI capabilities by 2030, they predict significant economic impacts by 2050: • Annualized GDP growth of 3.5% (compared to 2.4% in 2025) • A labor force participation rate of 55% (roughly 10 million fewer jobs) • 80% of wealth held by the top 10% (highest since 1939) 🧵 Here's what we found:

Forecasting Research Institute tweet media

English

192

641

377.1K

Malcolm Murray retweetledi

Dean W. Ball@deanwball·27 Mar

if you’re an ai safety person who wants major federal action now, you should want for anthropic to lead in advancing the frontier into dangerous capabilities, because the Trump Admin will now be primed to see whatever anthropic does as “bad” and what other labs do as “good.” If anthropic hits an RSI loop first, it’s much likelier to be viewed by the admin as “weird” and “scary,” whereas if anyone else does it, it will be “normal” and “innovative.”

English

216

44.5K

Malcolm Murray retweetledi

Peter Wildeford🇺🇸🚀@peterwildeford·27 Mar

CLAUDE MYTHOS 👀 A CMS misconfiguration at Anthropic just leaked draft blog posts about "Claude Mythos". Anthropic confirmed it's real, calling it "the most capable we've built to date." Mythos is a new, fourth tier, larger and more expensive than Opus. The draft claims dramatically higher scores on coding, academic reasoning, and cybersecurity benchmarks. A few thoughts on what this actually means: - This is likely a larger pre-train with similar post-training. It's not obvious how much additional pre-training compute buys you at the current frontier - we're about to find out. - There's a lot of hyperventilating about what this means for AI trajectories. I think it's too early to update any forecasts. AI was already moving very fast. - Some people are alarmed that Anthropic is sitting on a model it considers dangerous. But this is what Anthropic does with every frontier model. - It seems right now that another thing blocking release is just unit economics. Anthropic says it's "very expensive for [Anthropic] to serve and will be very expensive for customers to use." Anthropic is "wokring to make the model much more efficient before any general release." - I wonder if the way this will function is as a competitor to GPT 5.4 Pro. Claude seems to beat OpenAI on everything except the Pro-specific line, the $200/mo model that thinks for a long time and excels at things like math. Claude isn't currently solving open math problems like OpenAI and Google. I wonder if Mythos will change that. - It's great to see Anthropic engaged in 'differential access', rolling out the model to cyber defenders before giving access generally. The cyber capabilities of models are getting genuinely scary. - One irony for cyberdefense - This entire leak happened because of a CMS misconfiguration, exactly the basic security hygiene failure that these cyber-capable models were supposedly going to help defenders prevent.

M1@M1Astra

Claude Mythos Blog Post Saved before it was taken down. m1astra-mythos.pages.dev

English

217

27.8K

Malcolm Murray retweetledi

Jeffrey Ladish@JeffLadish·28 Şub

I am also disappointed in Anthropic’s walk-back on their RSP commitments but I think the president’s response - canceling all government contracts with Anthropic - is a big overreaction

English

410

19.7K

Malcolm Murray retweetledi

Zvi Mowshowitz@TheZvi·28 Şub

I look forward to reading the contract terms and hearing more because I know what this looks like, what it implies about how everything went down, and at least one major player in this (DoW, OpenAI or Anthropic) is very profoundly, blatantly lying to us.

Sam Altman@sama

Tonight, we reached an agreement with the Department of War to deploy our models in their classified network. In all of our interactions, the DoW displayed a deep respect for safety and a desire to partner to achieve the best possible outcome. AI safety and wide distribution of benefits are the core of our mission. Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement. We also will build technical safeguards to ensure our models behave as they should, which the DoW also wanted. We will deploy FDEs to help with our models and to ensure their safety, we will deploy on cloud networks only. We are asking the DoW to offer these same terms to all AI companies, which in our opinion we think everyone should be willing to accept. We have expressed our strong desire to see things de-escalate away from legal and governmental actions and towards reasonable agreements. We remain committed to serve all of humanity as best we can. The world is a complicated, messy, and sometimes dangerous place.

English

820

63.4K

Malcolm Murray retweetledi

Rohan Paul@rohanpaul_ai·28 Şub

You cannot trust AI to handle your bank account or run a business if it randomly breaks down when you change a single word in your instructions. A new Princeton University paper reveals that AI agents are crushing accuracy benchmarks but completely failing at actual dependability. They tested 14 different models across 500 benchmark runs to rigorously measure their performance under pressure. And proves that these tools are actually way too unpredictable to handle any serious tasks on their own right now. The technology industry currently evaluates LLMs purely on average success rates, completely ignoring whether systems can get the exact same answer twice. The authors borrowed aviation engineering principles to break true reliability down into consistency, robustness, predictability, and safety. Consistency means the model produces the exact same correct result every single time it tries a task. Robustness measures if the system survives minor technical glitches or a slight rephrasing of your prompt. Predictability checks if the agent actually knows when it is confused instead of confidently guessing. Testing proved predictability is overwhelmingly the weakest link across all modern language models. They discovered that simply building larger models does not automatically resolve these massive dependability failures. ---- Paper Link – arxiv. org/abs/2602.16666 Paper Title: "Towards a Science of AI Agent Reliability"

English

107

388

1.4K

95.5K

Malcolm Murray retweetledi

Evan Hubinger@EvanHub·27 Şub

We may yet fail to rise to all the challenges posed by transformative AI. But it is worth celebrating that when it mattered most and we were asked to compromise the most basic principles of liberty, we said no. I hope others will join. notdivided.org

Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

832

27.4K

Malcolm Murray retweetledi

Gillian Hadfield@ghadfield·27 Şub

The 2026 AI Safety Report's biggest finding isn't the risks it catalogs. It's the evidence gap. We're trying to build AI governance with almost no science underneath. Massive investment in the research regulatory systems depend on is overdue. internationalaisafetyreport.org

English

4.1K

Malcolm Murray retweetledi

Jide 🔍@jide_alaga·10 Şub

Man, AI safety people talk about transparency and external scrutiny a lot, but afaict its just a handful of people on twitter who are actually reviewing and critiquing company reports and safety frameworks (and they don’t get much attention).

English

7.1K

Malcolm Murray retweetledi

Tiancheng Hu @ ICLR 2026@tiancheng_hu·4 Şub

Proud to contribute to the new International AI Safety Report chaired by @YoshuaBengio, with a fantastic international team! Every word was weighed to ensure a rigorous, evidence-based view of current AI capabilities and the risks they pose. A short summary of my section below.

Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English

640

Malcolm Murray retweetledi

Usman Gohar@UsmanGohar·3 Şub

🚨We are finally releasing the International AI Safety Report 2026 🥳 Incredibly proud to be part of this team. I contributed as a section lead on AI-generated content and its harms. This report is the result of the cumulative efforts of dozens of experts across the world 🌎

Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English

148

Malcolm Murray retweetledi

METR@METR_Evals·5 Şub

We estimate that GPT-5.2 with `high` (not `xhigh`) reasoning effort has a 50%-time-horizon of around 6.6 hrs (95% CI of 3 hr 20 min to 17 hr 30 min) on our expanded suite of software tasks. This is the highest estimate for a time horizon measurement we have reported to date.

English

182

1.7K

1.2M

Keşfet

@GovAIOrg @sayashk @random_walker @DKokotajlo @YoshuaBengio @elonmusk @BarackObama @taylorswift13