Ben Snodin

618 posts

Ben Snodin

@bsnodin

Making sense of AI | Formerly @AISecurityInst, @RethinkPriors | 5 years quant finance | nanotech PhD

Oxford Katılım Ocak 2012

360 Takip Edilen310 Takipçiler

Ben Snodin@bsnodin·1 May

5/ My speculative take: within software tasks, the spike is mostly explained by closeness to training distribution + hill climbability, both connected to models struggling with genuine creativity / novel ideation. Full post: bensnodin.com/blog/2026/04/3…

English

600

Ben Snodin@bsnodin·1 May

4/ I do think the spike is real despite my results. Limitations of my approach: The time horizon suite is a narrow slice of tasks, binary factors are a blunt instrument, and the LLM factor grader likely had limitations.

English

340

Ben Snodin@bsnodin·1 May

1/ New blog post where I try to figure out what makes tasks easy vs hard for AI agents, using @METR_Evals time horizon data. Short version: I didn't find much. My study has flaws, but still, this makes me think it's hard to find very simple descriptions of the capabilities spike

English

4.7K

Ben Snodin retweetledi

Chris Painter@ChrisPainterYup·6 Şub

My bio says I work on AGI preparedness, so I want to clarify: We are not prepared. Over the last year, dangerous capability evaluations have moved into a state where it's difficult to find any Q&A benchmark that models don't saturate. Work has had to shift toward measures that are either much more finger-to-the-wind (quick surveys of researchers about real-world use) or much more capital- and time-intensive (randomized controlled "uplift studies"). Broadly, it's becoming a stretch to rule out any threat model using Q&A benchmarks as a proxy. Everyone is experimenting with new methods for detecting when meaningful capability thresholds are crossed, but the water might boil before we can get the thermometer in. The situation is similar for agent benchmarks: our ability to measure capability is rapidly falling behind the pace of capability itself (look at the confidence intervals on METR's time-horizon measurements), although these haven't yet saturated. And what happens if we concede that it's difficult to "rule out" these risks? Does society wait to take action until we can "rule them in" by showing they are end-to-end clearly realizable? Furthermore, what would "taking action" even mean if we decide the risk is imminent and real? Every American developer faces the problem that if it unilaterally halts development, or even simply implements costly mitigations, it has reason to believe that a less-cautious competitor will not take the same actions and instead benefit. From a private company's perspective, it isn't clear that taking drastic action to mitigate risk unilaterally (like fully halting development of more advanced models) accomplishes anything productive unless there's a decent chance the government steps in or the action is near-universal. And even if the US government helps solve the collective action problem (if indeed it *is* a collective action problem) in the US, what about Chinese companies? At minimum, I think developers need to keep collecting evidence about risky and destabilizing model properties (chem-bio, cyber, recursive self-improvement, sycophancy) and reporting this information publicly, so the rest of society can see what world we're heading into and can decide how it wants to react. The rest of society, and companies themselves, should also spend more effort thinking creatively about how to use technology to harden society against the risks AI might pose. This is hard, and I don't know the right answers. My impression is that the companies developing AI don't know the right answers either. While it's possible for an individual, or a species, to not understand how an experience will affect them and yet "be prepared" for the experience in the sense of having built the tools and experience to ensure they'll respond effectively, I'm not sure that's the position we're in. I hope we land on better answers soon.

English

111

240

1.5K

206.7K

Ben Snodin retweetledi

Joel Becker@joel_bkr·31 Eki

we're hiring for a human data lead at @METR_Evals! this role would've been totally critical to our time horizon and developer productivity work in the past, and i expect it to be critical to an even more varied range of METR research outputs going forwards.

English

36.6K

Ben Snodin@bsnodin·1 Eyl

METR has measured a doubling time of around 200 days for the time horizon of software engineering tasks a model can complete with a 50% success rate. Working on the measurement for Opus 4.1 really brought home to me what a crazy rate of progress this is!

English

179

Ben Snodin@bsnodin·1 Eyl

Really enjoyed helping @METR_Evals with this! Interesting moment I had: I worried that the Opus 4.1 50% time horizon measurement was implausibly high: it's 25 mins (31%!) longer than Opus 4, which was released only 3 months ago But of course this is pretty much exactly on trend

METR@METR_Evals

We estimate that Claude Opus 4.1 has a 50%-time-horizon of around 1 hr 45 min (95% confidence interval of 50 to 195 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

English

3.6K

Ben Snodin retweetledi

METR@METR_Evals·9 Ağu

Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.

English

310

61K

Ben Snodin retweetledi

Ryan Greenblatt@RyanPGreenblatt·23 May

A week ago, Anthropic quietly weakened their ASL-3 security requirements. Yesterday, they announced ASL-3 protections. I appreciate the mitigations, but quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work. 🧵

English

355

67.1K

Ben Snodin retweetledi

Rob Wiblin@robertwiblin·23 Nis

A new legal letter aimed at OpenAI lays out in stark terms the money and power grab OpenAI is trying to trick its board members into accepting — what one analyst calls "the theft of the millennium." The simple facts of the case are both devastating and darkly hilarious. I'll explain for your amusement. The letter 'Not For Private Gain' is written for the relevant Attorneys General and is signed by 3 Nobel Prize winners among dozens of top ML researchers, legal experts, economists, ex-OpenAI staff and civil society groups. (I'll link below.) It says that OpenAI's attempt to restructure as a for-profit is simply totally illegal, like you might naively expect. It then asks the Attorneys General (AGs) to take some extreme measures I've never seen discussed before. Here's how they build up to their radical demands. For 9 years OpenAI and its founders went on ad nauseam about how non-profit control was essential to: 1. Prevent a few people concentrating immense power 2. Ensure the benefits of artificial general intelligence (AGI) were shared with all humanity 3. Avoid the incentive to risk other people's lives to get even richer They told us these commitments were legally binding and inescapable. They weren't in it for the money or the power. We could trust them. "The goal isn't to build AGI, it's to make sure AGI benefits humanity" said OpenAI President Greg Brockman. And indeed, OpenAI’s charitable purpose, which its board is legally obligated to pursue, is to “ensure that artificial general intelligence benefits all of humanity” rather than advancing “the private gain of any person.” 100s of top researchers chose to work for OpenAI at below-market salaries, in part motivated by this idealism. It was core to OpenAI's recruitment and PR strategy. Now along comes 2024. That idealism has paid off. OpenAI is one of the world's hottest companies. The money is rolling in. But now suddenly we're told the setup under which they became one of the fastest-growing startups in history, the setup that was supposedly totally essential and distinguished them from their rivals, and the protections that made it possible for us to trust them, ALL HAVE TO GO ASAP: 1. The non-profit's (and therefore humanity at large’s) right to super-profits, should they make tens of trillions? Gone. (Guess where that money will go now!) 2. The non-profit’s ownership of AGI, and ability to influence how it’s actually used once it’s built? Gone. 3. The non-profit's ability (and legal duty) to object if OpenAI is doing outrageous things that harm humanity? Gone. 4. A commitment to assist another AGI project if necessary to avoid a harmful arms race, or if joining forces would help the US beat China? Gone. 5. Majority board control by people who don't have a huge personal financial stake in OpenAI? Gone. 6. The ability of the courts or Attorneys General to object if they betray their stated charitable purpose of benefitting humanity? Gone, gone, gone! Screenshotting from the letter: (I'll do a new tweet after each image so they appear right.) 1/

English

429

1.2K

4.6K

14.4M

Ben Snodin@bsnodin·22 Nis

@NeelNanda5 Love this!

English

Neel Nanda@NeelNanda5·20 Nis

Everyone's heard of vibe coding. But what about vibe research? In my new research stream I code with my voice/cursor and minimal typing. It was surprisingly productive! Recommended Just needing to describe tedious but easy tasks lowered friction and increases feedback loops.

English

330

20.4K

Ben Snodin retweetledi

Marie Davidsen Buhl@MarieBassBuhl·11 Şub

🧵In the lead-up to the Paris Summit, I’ve been working with the UK AI Safety Institute on resources to help companies write and implement safety frameworks. Today we're publishing papers on both topics! (1/10)

AI Security Institute@AISecurityInst

Our new papers examine the research wave surrounding safety frameworks. One explores emerging practices, and the other introduces safety cases for implementation.📖🔐

English

3.3K

Ben Snodin retweetledi

AI Security Institute@AISecurityInst·11 Şub

Our new papers examine the research wave surrounding safety frameworks. One explores emerging practices, and the other introduces safety cases for implementation.📖🔐

English

9.3K

Ben Snodin retweetledi

Ethan Mollick@emollick·10 Şub

Thoughts on this post: 1) It echoes what we have been hearing from multiple labs about the confidence of scaling up to AGI quickly 2) There is no clear vision of what that world looks like 3) The labs are placing the burden on policymakers to decide what to do with what they make

Sam Altman@sama

Three Observations: blog.samaltman.com/three-observat…

English

540

103.3K

Ben Snodin retweetledi

Jaime Sevilla@Jsevillamol·30 Oca

Stargate -- roughly on trend for training expenses so far DeepSeek -- also on trend for algorithmic efficiency Straight-line-on-plot remains undefeated as a forecasting method.

English

340

68.3K

Ben Snodin retweetledi

Daniel Privitera@privitera_·29 Oca

The 1st International AI Safety Report is out today! Being the Lead Writer and collaborating with 100 leading AI experts (including Nobel laureates, Turing Award winners, etc) has been an honor. I look forward to it being read and discussed by policymakers and governments around the world. If you have thoughts on how to improve the report next time, please share them with us!

Yoshua Bengio@Yoshua_Bengio

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: assets.publishing.service.gov.uk/media/679a0c48… 1/16

English

3.4K

Ben Snodin retweetledi

Yoshua Bengio@Yoshua_Bengio·29 Oca

English

515

1.4K

401.7K

Ben Snodin retweetledi

Helen Toner@hlntnr·27 Oca

Bad DeepSeek takes flying thick and fast today. Thread of good ones instead: (all subject to Matt's correct meta-take, caveat lector ⬇️ )

Matt Clifford@matthewclifford

My entire feed today is basically: “DeepSeek R1 is a huge moment that validates everything I previously thought about AI and all my existing policy and technical positions” 🙃

English

114

1.5K

658.1K

Keşfet

@METR_Evals @NeelNanda5 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA