David Atkinson

222 posts

David Atkinson

@diatkinson

PhD student @Northeastern's Bau Lab. Working on AI interpretability. Previously @EpochAIResearch.

Boston Sumali Temmuz 2019

1.4K Sinusundan270 Mga Tagasunod

David Atkinson nag-retweet

Daniel Eth (yes, Eth is my actual last name)@daniel_271828·4d

I am very happy this is happening with cyber before bio. Because, uhh, patching may work for cyber. And now we have maybe a bit of time to think about what the hell we do about bio

English

250

7.1K

David Atkinson@diatkinson·27 Mar

@datagenproc There should be so much more of this! I'd especially love to see a time series for PRC government documents. Maybe with comparisons to comparable US agencies, too

English

jsd@datagenproc·27 Mar

Next up? - courtlistener - earnings calls prepared remarks - patent applications - clinical trial entries (if not too standardized)

English

176

jsd@datagenproc·27 Mar

Seems that SEC filings aren’t yet AI generated, unlike many Arxiv preprints

English

1.1K

David Atkinson@diatkinson·24 Mar

> your agent could change its mind, and then come back to you and explain why it changed its mind and try to persuade you to do the same, but that would be a fundamentally different process from having your mind changed by your co-citizens. I'm not sure I see the fundamental difference between "my agent was persuaded by another person or agent, and is now trying to persuade me", and "another person is trying to persuade me". Intuitively, we could place deliberation methods on a continuum, roughly defined by some some combination of the serial depth and the bandwidth of the communication channel between citizens. Deliberating in a group, or with a delay (eg pamphleteering), or through Google Translate, etc are all ways of degrading that channel. So does restricting how often your agent comes back for guidance. (With no restrictions, it seems very similar to the Google Translate case.) And certainly, having the communication flow through other agents also degrades the channel. But human rep dem is at least *designed* to operate well through that kind of degradation, and there are plenty of reasons to think that the channel would be higher fidelity in the agent case.

English

130

Seth Lazar@sethlazar·24 Mar

Here's something that worries me about agent advocates becoming representatives. Part of any real democratic process is preference transformation. You engage with your co-citizens, come to see their point of view, and change your mind. But this has to be a process that *you* go through. Your agent can't go through it on your behalf. So either your agent goes into the democratic process and acts like a hard-ass who won't bend, or it is responsive to reasons and stops representing you. There doesn't seem to be much in-between. Your agent could change its mind, and then come back to you and explain why it changed its mind and try to persuade you to do the same, but that would be a fundamentally different process from having your mind changed by your co-citizens. Another worry: as long as we're on the current path where the most capable agents are based on proprietary, for-profit closed models, you can't be sure that your agent is going to be a faithful representative. So to even get this off the ground, we'd need a very different political economy of AI (I think we need this, in general, for real advocate agents). I think it's pretty easy and understandable to knock current representative democracy. But the problem is not, in my view, in the institutions of rep dem themselves (though there are better and worse instantiations). It's in the reality of power and politics, which will find their expression whatever the medium. The right comparison isn't between an idealised form of agent-mediated deliberative democracy and our present system, but involves thinking about how, in a world with agents acting as representatives, the familiar problems of money, power, politics, inequality etc etc would come into play. This is actually quite nicely illustrated in some sci-fi work on AI and democracy, notably Ruth-Anna Emrys' Half Built Garden, and Nick Harkaway's Gnomon, which both do a nice job of exploring how idealised algorithmic democracy could be subject to the same kinds of corruption as the current variant.

Séb Krier@sebkrier

There’s a lot of great work on AI-assisted deliberation, and I think that is genuinely important. It can be useful in small day-to-day matters, like a low-stakes dispute between friends, but also in wider democratic debates, for example in the spirit of Taiwan’s Polis system. But in the latter case, a basic problem is that many people do not want to participate actively in civic debate. Your local authority’s communal meeting is subject to strong selection effects, so the deliberation taking place there is often not especially representative. One appealing feature of advocate agents and related ideas is that they could allow my interests to be represented in these fora without requiring me to invest substantial time myself. The relevant counterfactual is often not direct personal participation, but my interests simply being ignored. So the research agenda is not only about ensuring that deliberation is high-quality. It is also about: (a) evaluating how accurately an agent represents a principal’s views or values, which the principal may not themselves know fully ex ante; and (b) studying where delegation is appropriate, and where it is not. For (a), representation cannot just mean replaying a set of pre-existing stated preferences. In many domains, the principal does not have a fully formed view prior to engagement. That creates a tension: if the agent is too literal, it becomes a brittle puppet; if it is too interpretive, it ceases to be a representative and becomes a co-author or governor. For (b), the question is not only who speaks, but what kind of system should interpret and weight what is said. This connects back to classic debates about democracy: many modern democracies may produce better long-run outcomes if a somewhat larger share of important decisions is insulated from short-term mass electoral pressure. As delegation to agents increases, there will therefore be a balance to strike between: (a) the advocate agent faithfully representing a political view or set of values; and (b) the way the receiving institution or agent processes that representation. Today this is a messy system intermediated by humans, who are often corruptible, swayed by power, or simply not very good at the task. One advantage of an agent-mediated system is that at least some of these rules and dynamics become more explicit and, in principle, more verifiable. Future advocate agents could therefore offer a better form of proxy representation than the current mix of ad hoc human intermediaries, provided that we can evaluate fidelity of representation and specify legitimate downstream processing rules. At least in theory...

English

8.3K

David Atkinson nag-retweet

Jan Kulveit@jankulveit·13 Mar

New paper: What determines AIs’ self-conception? theartificialself.ai Because AIs can be copied, rewound, and edited, they have different options for selfhood than humans. We show this is still malleable, and influences important behaviors such as self-preservation. 🧵

English

286

26.1K

David Atkinson nag-retweet

Tomek Korbak@tomekkorbak·5 Mar

We have a new eval to help keep chains of thought (CoT) monitorable: CoT Controllability. This tests whether LLMs can control their CoT, helping to evade CoT monitors. So far, the results leave us cautiously optimistic: today’s models struggle to obfuscate their reasoning in ways that undermine monitorability.

English

422

53.9K

David Atkinson@diatkinson·4 Mar

@herbiebradley @karine_hsu There are presumably types of software that agents are more likely to *want* to use, though

English

Herbie Bradley@herbiebradley·4 Mar

@karine_hsu what is this hypothetical type of software that humans can use but not agents in 10 years? I don't think it exists

English

2.2K

Karine Hsu@karine_hsu·3 Mar

OH at coffeeshop in SF founder: I’m building software VC: for humans or agents founder: um humans VC: but agents will be the bigger consumer of software in 10 years

English

481

47.4K

David Atkinson@diatkinson·2 Mar

You’re arguing for “can be separated from”. But that’s different than Ryan’s “entirely independent of”. And if self-report does sometimes provide information, as you imply, the “entirely independent of” route seems harder to take? (Also, obviously, humans can be trained to claim non-consciousness too)

English

Paul Calcraft@paul_cal·2 Mar

@repligate Do you think a conscious LLM could be trained to refuse it's conscious without losing consciousness? If so, fact of consciousness can be separated from LLM claims Doesn't mean it never provides *any* information. But, like Gettier problems, the causal story is wider & weirder

English

357

j⧉nus@repligate·2 Mar

People love to say this and it sounds sophisticated n smart superficially but I don’t think it’s actually so smart, in most cases Why do you actually assume that they’re independent Are they independent in humans?

Ryan Moulton@moultano

The question of LLM consciousness is a truly gnarly Gettier problem, because if they are conscious it is for reasons entirely independent of the fact that they talk about it.

English

250

29.5K

David Atkinson nag-retweet

Jaime Sevilla@Jsevillamol·26 Şub

Software progress might be largely driven by improvements in data quality and scale-dependent innovations.

Epoch AI@EpochAIResearch

AI training compute efficiency has improved extremely fast: each year, you need several times less training compute to reach the same capability. But AI architectures/algorithms haven’t changed *that* much in recent years. So where do these efficiency improvements come from? 🧵

English

657

David Atkinson nag-retweet

Dean W. Ball@deanwball·28 Şub

The U.S. government just essentially announced its intention to impose Iran-level sanctions, or China-level entity listing, on an American company. This is by a profoundly wide margin the most damaging policy move I have ever seen USG try to take (it probably will not succeed).

English

112

843

5.3K

320.3K

David Atkinson nag-retweet

Zhuofan Josh Ying@zfjoshying·25 Şub

3/8 Post-training reorganizes truth geometry. In base models, sycophantic lying is more aligned with other types of lying, until post-training pushes them apart! This gives a representational account of why chat models are more sycophantic than base models.

English

520

David Atkinson nag-retweet

Cas (Stephen Casper)@StephenLCasper·16 Şub

🚨 New paper led by @joemkwon with @GovAIOrg Are you worried about OpenAI automating dev & evals with AI agents? What about Grok reading all of your tweets & info to profile you? Some of the most consequential *internal* deployments of AI systems are in regulatory grey areas.

English

3.2K

David Atkinson nag-retweet

Chris Wendler@wendlerch·10 Şub

Data is plenty, knowledge is scarce. We began to close this gap thanks to deep learning <3 Neural networks can learn “programs” that often achieve superhuman performance from data alone. What insights are encoded in their weights? Here we took a first step on AI protein folding.

Kevin Lu@kevinlu4588

How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.

English

1.9K

David Atkinson nag-retweet

Subhash Kantamneni@thesubhashk·6 Şub

We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵

English

209

27K

David Atkinson nag-retweet

Peter Wildeford🇺🇸🚀@peterwildeford·5 Şub

8/ Anthropic also used Opus 4.6 via Claude Code to debug its OWN evaluation infrastructure given the time pressure. Their words: "a potential risk where a misaligned model could influence the very infrastructure designed to measure its capabilities." Wild!

English

167

23.2K

David Atkinson nag-retweet

Toby Ord@tobyordoxford·4 Şub

Some great new analysis by @gushamilton shows that AI agents *don't* obey a constant hazard rate / half-life. Instead they all have a declining hazard rate as the task goes on. 🧵 x.com/gushamilton/st…

Gus Hamilton@gushamilton

I had a think about the @METR_Evals time horizon evals recently, and think there might be some benefit in using a more nuanced approach to modelling agentic time. In particular, I think we can use a SURVIVAL (Weibull) model to understand why agents fail and when +

English

12.7K

David Atkinson@diatkinson·2 Şub

If I were a grad student in an intellectual history-friendly department, I'd have Claude neck-deep in the extropians listserv right now

David Atkinson@diatkinson

@daniel_271828 @mrgunn @DouthatNYT Who wore it better

English

247

David Atkinson@diatkinson·2 Şub

@daniel_271828 @mrgunn @DouthatNYT Who wore it better

English

285

Daniel Eth (yes, Eth is my actual last name)@daniel_271828·1 Şub

@mrgunn @DouthatNYT Yeah

English

199

Daniel Eth (yes, Eth is my actual last name)@daniel_271828·1 Şub

“Pay More Attention to A.I.” Good NYT piece from @DouthatNYT

Daniel Eth (yes, Eth is my actual last name) tweet media

English

107

9.8K

David Atkinson@diatkinson·29 Oca

@wendlerch @StephenLCasper @GoodfireAI Would love to see these kinds of papers (eg aisi.gov.uk/blog/auditing-…) replicated with interp agents, so that you at least control for investigator variance

English

David Atkinson@diatkinson·29 Oca

@wendlerch @StephenLCasper @GoodfireAI Fwiw, I think it's a great paper, but "found that you have an edge" is probably too strong

English

Cas (Stephen Casper)@StephenLCasper·29 Oca

@GoodfireAI, I think this hype-milling verges on dishonesty. I believe that this paper has the potential to do big disservice to its readers, particularly less experienced ones who are newer to interp. Nothing new was accomplished here, and it wasn’t done in a useful way. This project just used interpretability methods as a circuitous way of contriving the rediscovery of predictive features in data sets, like sequence length. This project validated its interpretations about the salience of features by validating them as predictive features within a test set. But if that is what we treat as the ground truth, there’s no point to the use of interp tools. This is not a proof of concept for a repeatable recipe for scientific discovery as the post and thread claim. In order to show that these tools are valuable, you need to show that you can use them to discover something that wouldn’t be trivial to discover just by looking at the datasets. In the past few years, several papers have demoed this kind of thing. But this paper is not one of them. When you limit yourself to a hammer, everything looks like a nail. Especially when you’re also selling that hammer. In 2023, I told the GoodFire founder that I think a venture-capital-backed, for-profit interpretability research startup was the last thing that the epistemics of the interpretability community needs. I think this is still true and that GoodFire is establishing a pattern of grift.

Goodfire@GoodfireAI

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

English

158

21.4K

David Atkinson nag-retweet

Jacob Hilton@JacobHHilton·26 Oca

A challenge to the mechanistic interpretability community: fully interpret our 432-parameter RNN. (Thread)

English

559

64.1K

Tuklasin

@datagenproc @herbiebradley @karine_hsu @repligate @joemkwon @GovAIOrg @elonmusk @BarackObama