James Fox

120 posts

James Fox

James Fox

@James_D_Fox

Senior Science Associate at Schmidt Sciences (AI Institute)

Katılım Şubat 2019
1.4K Takip Edilen153 Takipçiler
James Fox retweetledi
METR
METR@METR_Evals·
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
METR tweet media
English
28
188
856
285.1K
James Fox retweetledi
David Duvenaud
David Duvenaud@DavidDuvenaud·
Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵
English
200
456
3.6K
1.4M
James Fox retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
We're hiring for an operations lead at Truthful AI, my non-profit research organization! - Generalist role: recruiting, fundraising, communications, and PMing to support our research - At our office in Constellation (Berkeley, CA) preferred - Salary is $140–200k plus benefits
English
6
33
274
31.5K
James Fox retweetledi
Gillian Hadfield
Gillian Hadfield@ghadfield·
.@deanwball in The Economist makes the public case for an architecture Dean and I have both been advancing: a network of independent organizations that audit AI safety claims. Worth a read. buff.ly/gZZCh8D
English
2
15
84
7K
James Fox retweetledi
Kevin Roose
Kevin Roose@kevinroose·
New column: I went to visit @METR_Evals, the 30-person AI nonprofit that makes the Most Important Chart in the World. I learned a lot, but the most striking thing was how soon some of them think AI R&D could be fully automated. (This year!) nytimes.com/2026/04/17/tec…
English
11
56
532
93.1K
James Fox retweetledi
david rein
david rein@idavidrein·
@tmkadamcz and I started working on MirrorCode, a new long-horizon software engineering benchmark, last September. I think it’s the best benchmark for measuring AI’s ability to complete very hard (but precisely specified) software tasks—but it’s likely already saturated.
Epoch AI@EpochAIResearch

What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread.

English
6
26
175
30.7K
James Fox retweetledi
Jan Kulveit
Jan Kulveit@jankulveit·
Seems there is a surprising amount of confusion / nonsense on my timeline about "Mythos vs. open models" even among people who are usually sensible (+usual twitter toxoplasma of rage). I like the work of both @AnthropicAI and @Aisle_Inc, so here are some takes: 1. Anthropic is not a household name yet, but is a big brand with a lot of visibility among people who broadly follow tech and AI. This has the effect that even if there is prior work doing something similar / comparably impressive, it usually has way less exposure. Often 10-100x more people learn about something when Anthropic publishes their version. In a variant of the Matthew effect, non-experts often assign most or all credit to Anthropic, by virtue of not being aware of prior work. And are more surprised. Seen this multiple times in research just in the past few months: persona selection model, emotion vectors, now: impressive/scary cyber capabilities. It is usually at least moderately annoying from the perspective of niche experts who are also impressed, but often not impressed by the same things. 2. Mythos is clearly a very impressive model with scary capabilities. "Huge discontinuity in cyber-risk" needs more subtlety in what the comparison is. If old models + minimal harness, there is a large gap. If SOTA harness/scaffolded system like what AISLE does, the difference between "raw Mythos" vs. "SOTA scaffolding + other models" can be moderate, small (or even non-existent). I don't know, and possibly no one does right now: Anthropic likely does not have SOTA harnesses for bug finding; startups who work on this likely do not have access to Mythos. Some evidence comes from the fact AISLE was finding & fixing hundreds of 0-day vulnerabilities in a similar weight category as what Anthropic published, often in major parts of internet infrastructure like OpenSSL or curl, and in decades-old code, all before Mythos. (And maybe ~99% of people who see Anthropic + Mythos as a step change haven't noticed this, including various highly visible pundits.) 3. My overall take is this matches the general intuition where a great harness often buys you ~up to one generation of model capability on tasks which are not really optimised during model training. Anthropic seems to aim for RSI and the actually optimised tasks seem to be coding and ML, not vulnerability discovery. I would guess where Mythos is actually a bigger jump is automated exploit construction. 4. AISLE wrote a blog showing small models can often notice the same things, and arguing for the importance of the harness. These results are in my view interesting, although obviously the question is sensitivity AND specificity, and to what extent you can automatically eliminate false positives with further tooling. (AISLEs prior successes show you can, they are a few people + automated pipeline, not some labour-intense bug hunting) 5. The post got noticed on twitter. Various fundamentally unserious people like @ylecun took it as an opportunity to dunk on Anthropic, claiming it's all BS, hype, etc. (This is nonsense) 6. The counter-reaction was to point out limits of the AISLE blogpost, mostly based on the correct claim that finding the relevant part of the code is a large part of the problem. Toxoplasma of rage amplifying reactions and over-reactions to the most bizarre Mythos-denialism nonsense. 7. In some people's minds this grew out of proportion, taking the fact that AISLE provided the specific part of code as some sort of killer argument making AISLE's findings worthless. (Anthropic's @mooncat_is: "We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.") In fact this does not settle the question. As the AISLE original post explains, the inference cost difference is large enough that you can run the small model on every such code chunk individually. So the right comparison may be needles/$. Also: Anthropic did also split the heap into smaller chunks, running the analysis per file. The deeper question is actually similar to some classical question about HCH: if you have a large number of people working each for 30 minutes, how does that compare to one human working for 10 hours? If you have a large number of IQ 130 humans working for long time, how does that compare to one IQ 150 human? Where current cyber capabilities fall is ultimately an empirical question. As @boazbarak noted, people from the harness company tell you the harness is important, people from the model company tell you the model is important. I'm somewhat in between - both are important now, harnesses can compensate for ˜1 generation; yes, in the limit the models will likely just build their own scaffolding. Also: if "cheap quantity can't compete with superior concentration of intelligence" were generally true, it would be actually very scary for AI safety: "scaling oversight" plans fundamentally depend on weaker intelligences overseeing stronger ones, compensating through effort, quantity, and orchestration.
English
12
32
213
29.4K
James Fox retweetledi
Stanislav Fort
Stanislav Fort@stanislavfort·
New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
Stanislav Fort tweet media
English
46
152
1K
407K
James Fox retweetledi
Michael Nielsen
Michael Nielsen@michael_nielsen·
Just a reminder of @AsteraInstitute's open essay competition about identifying and overcoming scientific bottlenecks. Deadline for entries is May 1!
Michael Nielsen tweet media
English
2
49
155
21.3K
James Fox retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Something I've been thinking about - I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. "Seeing like a state" is the common reference), but with AI, society can dramatically improve its ability to do this in reverse. Government accountability has not been constrained by access (the various branches of government publish an enormous amount of data), it has been constrained by intelligence - the ability to process a lot of raw data, combine it with domain expertise and derive insights. As an example, the 4000-page omnibus bill is "transparent" in principle and in a legal sense, but certainly not in a practical sense for most people. There's a lot more like it: laws, spending bills, federal budgets, freedom of information act responses, lobbying disclosures... Only a few highly trained professionals (investigative journalists) could historically process this information. This bottleneck might dissolve - not only are the professionals further empowered, but a lot more people can participate. Some examples to be precise: Detailed accounting of spending and budgets, diff tracking of legislation, individual voting trends w.r.t. stated positions or speeches, lobbying and influence (e.g. graph of lobbyist -> firm -> client -> legislator -> committee -> vote -> regulation), procurement and contracting, regulatory capture warning lights, judicial and legal patterns, campaign finance... Local governments might be even more interesting because the governed population is smaller so there is less national coverage: city council meetings, decisions around zoning, policing, schools, utilities... Certainly, the same tools can easily cut the other way and it's worth being very mindful of that, but I lean optimistic overall that added participation, transparency and accountability will improve democratic, free societies. (the quoted tweet is half-ish related, but inspired me to post some recent thoughts)
Harry Rushworth@Hrushworth

The British Government is a complicated beast. Dozens of departments, hundreds of public bodies, more corporations than one can count... Such is its complexity that there isn't an org chart for it. Well, there wasn't... Introducing ⚙️Machinery of Government⚙️

English
416
737
6K
993.8K
James Fox retweetledi
James Fox retweetledi
Coefficient Giving
Coefficient Giving@coeff_giving·
Today, we're opening a new Request for Proposals in Biosecurity. Broadly speaking, we want to support work aimed at preventing engineered biological threats from emerging and improving our response should prevention fail. 🧵
English
4
21
75
10.1K
James Fox retweetledi
Boaz Barak
Boaz Barak@boazbaraktcs·
New blog post: the state of AI safety in four fake graphs.
Boaz Barak tweet mediaBoaz Barak tweet mediaBoaz Barak tweet mediaBoaz Barak tweet media
English
66
152
1.4K
655.3K
James Fox retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.3K
3.5M
James Fox retweetledi
Natasha Jaques
Natasha Jaques@natashajaques·
The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.
Natasha Jaques tweet media
English
45
393
1.5K
257.3K