James Fox

120 posts

James Fox

@James_D_Fox

Senior Science Associate at Schmidt Sciences (AI Institute)

Katılım Şubat 2019

1.4K Takip Edilen153 Takipçiler

James Fox retweetledi

Thinking Machines@thinkymachines·6d

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! thinkingmachines.ai/news/interacti…

English

193

1.6K

565.7K

James Fox retweetledi

METR@METR_Evals·6d

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

188

856

285.1K

James Fox retweetledi

Elliott Thornley@ElliottThornley·27 Nis

Will MacAskill (@willmacaskill) on the 80k podcast talking about our new paper. It's about making AIs risk-averse as a safety strategy. Coming out soon! youtu.be/g0MikM4Bsbc?t=…

YouTube

English

3.6K

James Fox@James_D_Fox·28 Nis

Fascinating!

David Duvenaud@DavidDuvenaud

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

James Fox retweetledi

David Duvenaud@DavidDuvenaud·28 Nis

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

200

456

3.6K

1.4M

James Fox retweetledi

Owain Evans@OwainEvans_UK·27 Nis

We're hiring for an operations lead at Truthful AI, my non-profit research organization! - Generalist role: recruiting, fundraising, communications, and PMing to support our research - At our office in Constellation (Berkeley, CA) preferred - Salary is $140–200k plus benefits

English

274

31.5K

James Fox retweetledi

Gillian Hadfield@ghadfield·22 Nis

.@deanwball in The Economist makes the public case for an architecture Dean and I have both been advancing: a network of independent organizations that audit AI safety claims. Worth a read. buff.ly/gZZCh8D

English

James Fox retweetledi

Kevin Roose@kevinroose·17 Nis

New column: I went to visit @METR_Evals, the 30-person AI nonprofit that makes the Most Important Chart in the World. I learned a lot, but the most striking thing was how soon some of them think AI R&D could be fully automated. (This year!) nytimes.com/2026/04/17/tec…

English

532

93.1K

James Fox retweetledi

david rein@idavidrein·10 Nis

@tmkadamcz and I started working on MirrorCode, a new long-horizon software engineering benchmark, last September. I think it’s the best benchmark for measuring AI’s ability to complete very hard (but precisely specified) software tasks—but it’s likely already saturated.

Epoch AI@EpochAIResearch

What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread.

English

175

30.7K

James Fox retweetledi

Jan Kulveit@jankulveit·10 Nis

Seems there is a surprising amount of confusion / nonsense on my timeline about "Mythos vs. open models" even among people who are usually sensible (+usual twitter toxoplasma of rage). I like the work of both @AnthropicAI and @Aisle_Inc, so here are some takes: 1. Anthropic is not a household name yet, but is a big brand with a lot of visibility among people who broadly follow tech and AI. This has the effect that even if there is prior work doing something similar / comparably impressive, it usually has way less exposure. Often 10-100x more people learn about something when Anthropic publishes their version. In a variant of the Matthew effect, non-experts often assign most or all credit to Anthropic, by virtue of not being aware of prior work. And are more surprised. Seen this multiple times in research just in the past few months: persona selection model, emotion vectors, now: impressive/scary cyber capabilities. It is usually at least moderately annoying from the perspective of niche experts who are also impressed, but often not impressed by the same things. 2. Mythos is clearly a very impressive model with scary capabilities. "Huge discontinuity in cyber-risk" needs more subtlety in what the comparison is. If old models + minimal harness, there is a large gap. If SOTA harness/scaffolded system like what AISLE does, the difference between "raw Mythos" vs. "SOTA scaffolding + other models" can be moderate, small (or even non-existent). I don't know, and possibly no one does right now: Anthropic likely does not have SOTA harnesses for bug finding; startups who work on this likely do not have access to Mythos. Some evidence comes from the fact AISLE was finding & fixing hundreds of 0-day vulnerabilities in a similar weight category as what Anthropic published, often in major parts of internet infrastructure like OpenSSL or curl, and in decades-old code, all before Mythos. (And maybe ~99% of people who see Anthropic + Mythos as a step change haven't noticed this, including various highly visible pundits.) 3. My overall take is this matches the general intuition where a great harness often buys you ~up to one generation of model capability on tasks which are not really optimised during model training. Anthropic seems to aim for RSI and the actually optimised tasks seem to be coding and ML, not vulnerability discovery. I would guess where Mythos is actually a bigger jump is automated exploit construction. 4. AISLE wrote a blog showing small models can often notice the same things, and arguing for the importance of the harness. These results are in my view interesting, although obviously the question is sensitivity AND specificity, and to what extent you can automatically eliminate false positives with further tooling. (AISLEs prior successes show you can, they are a few people + automated pipeline, not some labour-intense bug hunting) 5. The post got noticed on twitter. Various fundamentally unserious people like @ylecun took it as an opportunity to dunk on Anthropic, claiming it's all BS, hype, etc. (This is nonsense) 6. The counter-reaction was to point out limits of the AISLE blogpost, mostly based on the correct claim that finding the relevant part of the code is a large part of the problem. Toxoplasma of rage amplifying reactions and over-reactions to the most bizarre Mythos-denialism nonsense. 7. In some people's minds this grew out of proportion, taking the fact that AISLE provided the specific part of code as some sort of killer argument making AISLE's findings worthless. (Anthropic's @mooncat_is: "We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.") In fact this does not settle the question. As the AISLE original post explains, the inference cost difference is large enough that you can run the small model on every such code chunk individually. So the right comparison may be needles/$. Also: Anthropic did also split the heap into smaller chunks, running the analysis per file. The deeper question is actually similar to some classical question about HCH: if you have a large number of people working each for 30 minutes, how does that compare to one human working for 10 hours? If you have a large number of IQ 130 humans working for long time, how does that compare to one IQ 150 human? Where current cyber capabilities fall is ultimately an empirical question. As @boazbarak noted, people from the harness company tell you the harness is important, people from the model company tell you the model is important. I'm somewhat in between - both are important now, harnesses can compensate for ˜1 generation; yes, in the limit the models will likely just build their own scaffolding. Also: if "cheap quantity can't compete with superior concentration of intelligence" were generally true, it would be actually very scary for AI safety: "scaling oversight" plans fundamentally depend on weaker intelligences overseeing stronger ones, compensating through effort, quantity, and orchestration.

English

213

29.4K

James Fox retweetledi

Stanislav Fort@stanislavfort·8 Nis

New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!

English

152

407K

James Fox@James_D_Fox·7 Nis

@AmmannNora is fantastic! This is an amazing opportunity to work with her on such an important mission!

Nora Ammann@AmmannNora

In related news… I’m building out a tiger team to pursue this mission with me! 🦸 I’m looking for people who are mission-driven, technically deep, and comfortable moving between formal methods, programming languages, AI, AI safety, and cybersecurity.

English

James Fox retweetledi

Michael Nielsen@michael_nielsen·6 Nis

Just a reminder of @AsteraInstitute's open essay competition about identifying and overcoming scientific bottlenecks. Deadline for entries is May 1!

English

155

21.3K

James Fox retweetledi

Andrej Karpathy@karpathy·5 Nis

Something I've been thinking about - I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. "Seeing like a state" is the common reference), but with AI, society can dramatically improve its ability to do this in reverse. Government accountability has not been constrained by access (the various branches of government publish an enormous amount of data), it has been constrained by intelligence - the ability to process a lot of raw data, combine it with domain expertise and derive insights. As an example, the 4000-page omnibus bill is "transparent" in principle and in a legal sense, but certainly not in a practical sense for most people. There's a lot more like it: laws, spending bills, federal budgets, freedom of information act responses, lobbying disclosures... Only a few highly trained professionals (investigative journalists) could historically process this information. This bottleneck might dissolve - not only are the professionals further empowered, but a lot more people can participate. Some examples to be precise: Detailed accounting of spending and budgets, diff tracking of legislation, individual voting trends w.r.t. stated positions or speeches, lobbying and influence (e.g. graph of lobbyist -> firm -> client -> legislator -> committee -> vote -> regulation), procurement and contracting, regulatory capture warning lights, judicial and legal patterns, campaign finance... Local governments might be even more interesting because the governed population is smaller so there is less national coverage: city council meetings, decisions around zoning, policing, schools, utilities... Certainly, the same tools can easily cut the other way and it's worth being very mindful of that, but I lean optimistic overall that added participation, transparency and accountability will improve democratic, free societies. (the quoted tweet is half-ish related, but inspired me to post some recent thoughts)

Harry Rushworth@Hrushworth

The British Government is a complicated beast. Dozens of departments, hundreds of public bodies, more corporations than one can count... Such is its complexity that there isn't an org chart for it. Well, there wasn't... Introducing ⚙️Machinery of Government⚙️

English

416

737

993.8K

James Fox@James_D_Fox·5 Nis

Nora's fantastic! 🚀

Nora Ammann@AmmannNora

I’ve recently accepted the Programme Director role at ARIA, taking over from davidad in running the Safeguarded AI programme. 🧵 about the programme’s strategic vision + our upcoming funding efforts in cybersecurity + we're hiring!

Català

James Fox retweetledi

Nora Ammann@AmmannNora·1 Nis

ARIA's Activation Partners model is plausibly the coolest innovation in terms of 'ways to foster high value, high ambition R&D' since FROs :)

Pranay Shah@Pranay_Shahh

Today we’re launching a £100m call to double down on ARIA's AI Scientist and Activation Partners initiatives We're looking for partners to bring advanced AI capabilities to R&D alongside translational expertise to help us turn speculative ideas into world-changing capabilities🧵

English

3.6K

James Fox retweetledi

Coefficient Giving@coeff_giving·31 Mar

Today, we're opening a new Request for Proposals in Biosecurity. Broadly speaking, we want to support work aimed at preventing engineered biological threats from emerging and improving our response should prevention fail. 🧵

English

10.1K

James Fox retweetledi

Boaz Barak@boazbaraktcs·30 Mar

New blog post: the state of AI safety in four fake graphs.

English

152

1.4K

655.3K

James Fox retweetledi

Andrej Karpathy@karpathy·28 Mar

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

English

1.7K

2.4K

31.3K

3.5M

James Fox retweetledi

Natasha Jaques@natashajaques·20 Mar

The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

English

393

1.5K

257.3K

Keşfet

@willmacaskill @AlecRad @status_effects @deanwball @METR_Evals @tmkadamcz @AnthropicAI @Aisle_Inc