David Atkinson

222 posts

David Atkinson

David Atkinson

@diatkinson

PhD student @Northeastern's Bau Lab. Working on AI interpretability. Previously @EpochAIResearch.

Boston Sumali Temmuz 2019
1.4K Sinusundan270 Mga Tagasunod
David Atkinson nag-retweet
Daniel Eth (yes, Eth is my actual last name)
I am very happy this is happening with cyber before bio. Because, uhh, patching may work for cyber. And now we have maybe a bit of time to think about what the hell we do about bio
English
6
20
250
7.1K
David Atkinson
David Atkinson@diatkinson·
@datagenproc There should be so much more of this! I'd especially love to see a time series for PRC government documents. Maybe with comparisons to comparable US agencies, too
English
0
0
1
26
jsd
jsd@datagenproc·
Next up? - courtlistener - earnings calls prepared remarks - patent applications - clinical trial entries (if not too standardized)
English
1
0
6
176
jsd
jsd@datagenproc·
Seems that SEC filings aren’t yet AI generated, unlike many Arxiv preprints
jsd tweet media
English
3
0
8
1.1K
David Atkinson
David Atkinson@diatkinson·
> your agent could change its mind, and then come back to you and explain why it changed its mind and try to persuade you to do the same, but that would be a fundamentally different process from having your mind changed by your co-citizens. I'm not sure I see the fundamental difference between "my agent was persuaded by another person or agent, and is now trying to persuade me", and "another person is trying to persuade me". Intuitively, we could place deliberation methods on a continuum, roughly defined by some some combination of the serial depth and the bandwidth of the communication channel between citizens. Deliberating in a group, or with a delay (eg pamphleteering), or through Google Translate, etc are all ways of degrading that channel. So does restricting how often your agent comes back for guidance. (With no restrictions, it seems very similar to the Google Translate case.) And certainly, having the communication flow through other agents also degrades the channel. But human rep dem is at least *designed* to operate well through that kind of degradation, and there are plenty of reasons to think that the channel would be higher fidelity in the agent case.
English
0
0
0
130
Seth Lazar
Seth Lazar@sethlazar·
Here's something that worries me about agent advocates becoming representatives. Part of any real democratic process is preference transformation. You engage with your co-citizens, come to see their point of view, and change your mind. But this has to be a process that *you* go through. Your agent can't go through it on your behalf. So either your agent goes into the democratic process and acts like a hard-ass who won't bend, or it is responsive to reasons and stops representing you. There doesn't seem to be much in-between. Your agent could change its mind, and then come back to you and explain why it changed its mind and try to persuade you to do the same, but that would be a fundamentally different process from having your mind changed by your co-citizens. Another worry: as long as we're on the current path where the most capable agents are based on proprietary, for-profit closed models, you can't be sure that your agent is going to be a faithful representative. So to even get this off the ground, we'd need a very different political economy of AI (I think we need this, in general, for real advocate agents). I think it's pretty easy and understandable to knock current representative democracy. But the problem is not, in my view, in the institutions of rep dem themselves (though there are better and worse instantiations). It's in the reality of power and politics, which will find their expression whatever the medium. The right comparison isn't between an idealised form of agent-mediated deliberative democracy and our present system, but involves thinking about how, in a world with agents acting as representatives, the familiar problems of money, power, politics, inequality etc etc would come into play. This is actually quite nicely illustrated in some sci-fi work on AI and democracy, notably Ruth-Anna Emrys' Half Built Garden, and Nick Harkaway's Gnomon, which both do a nice job of exploring how idealised algorithmic democracy could be subject to the same kinds of corruption as the current variant.
Séb Krier@sebkrier

There’s a lot of great work on AI-assisted deliberation, and I think that is genuinely important. It can be useful in small day-to-day matters, like a low-stakes dispute between friends, but also in wider democratic debates, for example in the spirit of Taiwan’s Polis system. But in the latter case, a basic problem is that many people do not want to participate actively in civic debate. Your local authority’s communal meeting is subject to strong selection effects, so the deliberation taking place there is often not especially representative. One appealing feature of advocate agents and related ideas is that they could allow my interests to be represented in these fora without requiring me to invest substantial time myself. The relevant counterfactual is often not direct personal participation, but my interests simply being ignored. So the research agenda is not only about ensuring that deliberation is high-quality. It is also about: (a) evaluating how accurately an agent represents a principal’s views or values, which the principal may not themselves know fully ex ante; and (b) studying where delegation is appropriate, and where it is not. For (a), representation cannot just mean replaying a set of pre-existing stated preferences. In many domains, the principal does not have a fully formed view prior to engagement. That creates a tension: if the agent is too literal, it becomes a brittle puppet; if it is too interpretive, it ceases to be a representative and becomes a co-author or governor. For (b), the question is not only who speaks, but what kind of system should interpret and weight what is said. This connects back to classic debates about democracy: many modern democracies may produce better long-run outcomes if a somewhat larger share of important decisions is insulated from short-term mass electoral pressure. As delegation to agents increases, there will therefore be a balance to strike between: (a) the advocate agent faithfully representing a political view or set of values; and (b) the way the receiving institution or agent processes that representation. Today this is a messy system intermediated by humans, who are often corruptible, swayed by power, or simply not very good at the task. One advantage of an agent-mediated system is that at least some of these rules and dynamics become more explicit and, in principle, more verifiable. Future advocate agents could therefore offer a better form of proxy representation than the current mix of ad hoc human intermediaries, provided that we can evaluate fidelity of representation and specify legitimate downstream processing rules. At least in theory...

English
3
12
71
8.3K
David Atkinson nag-retweet
Jan Kulveit
Jan Kulveit@jankulveit·
New paper: What determines AIs’ self-conception? theartificialself.ai Because AIs can be copied, rewound, and edited, they have different options for selfhood than humans. We show this is still malleable, and influences important behaviors such as self-preservation. 🧵
English
10
65
286
26.1K
David Atkinson nag-retweet
Tomek Korbak
Tomek Korbak@tomekkorbak·
We have a new eval to help keep chains of thought (CoT) monitorable: CoT Controllability. This tests whether LLMs can control their CoT, helping to evade CoT monitors. So far, the results leave us cautiously optimistic: today’s models struggle to obfuscate their reasoning in ways that undermine monitorability.
Tomek Korbak tweet media
English
11
50
422
53.9K
Herbie Bradley
Herbie Bradley@herbiebradley·
@karine_hsu what is this hypothetical type of software that humans can use but not agents in 10 years? I don't think it exists
English
1
0
5
2.2K
Karine Hsu
Karine Hsu@karine_hsu·
OH at coffeeshop in SF founder: I’m building software VC: for humans or agents founder: um humans VC: but agents will be the bigger consumer of software in 10 years
English
61
12
481
47.4K
David Atkinson
David Atkinson@diatkinson·
You’re arguing for “can be separated from”. But that’s different than Ryan’s “entirely independent of”. And if self-report does sometimes provide information, as you imply, the “entirely independent of” route seems harder to take? (Also, obviously, humans can be trained to claim non-consciousness too)
English
1
0
1
15
Paul Calcraft
Paul Calcraft@paul_cal·
@repligate Do you think a conscious LLM could be trained to refuse it's conscious without losing consciousness? If so, fact of consciousness can be separated from LLM claims Doesn't mean it never provides *any* information. But, like Gettier problems, the causal story is wider & weirder
English
1
0
3
357
David Atkinson nag-retweet
Dean W. Ball
Dean W. Ball@deanwball·
The U.S. government just essentially announced its intention to impose Iran-level sanctions, or China-level entity listing, on an American company. This is by a profoundly wide margin the most damaging policy move I have ever seen USG try to take (it probably will not succeed).
English
112
843
5.3K
320.3K
David Atkinson nag-retweet
Zhuofan Josh Ying
Zhuofan Josh Ying@zfjoshying·
3/8 Post-training reorganizes truth geometry. In base models, sycophantic lying is more aligned with other types of lying, until post-training pushes them apart! This gives a representational account of why chat models are more sycophantic than base models.
Zhuofan Josh Ying tweet media
English
1
2
10
520
David Atkinson nag-retweet
Cas (Stephen Casper)
Cas (Stephen Casper)@StephenLCasper·
🚨 New paper led by @joemkwon with @GovAIOrg Are you worried about OpenAI automating dev & evals with AI agents? What about Grok reading all of your tweets & info to profile you? Some of the most consequential *internal* deployments of AI systems are in regulatory grey areas.
Cas (Stephen Casper) tweet media
English
2
12
52
3.2K
David Atkinson nag-retweet
Chris Wendler
Chris Wendler@wendlerch·
Data is plenty, knowledge is scarce. We began to close this gap thanks to deep learning <3 Neural networks can learn “programs” that often achieve superhuman performance from data alone. What insights are encoded in their weights? Here we took a first step on AI protein folding.
Kevin Lu@kevinlu4588

How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.

English
2
10
29
1.9K
David Atkinson nag-retweet
Subhash Kantamneni
Subhash Kantamneni@thesubhashk·
We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵
Subhash Kantamneni tweet media
English
11
34
209
27K
David Atkinson nag-retweet
Peter Wildeford🇺🇸🚀
Peter Wildeford🇺🇸🚀@peterwildeford·
8/ Anthropic also used Opus 4.6 via Claude Code to debug its OWN evaluation infrastructure given the time pressure. Their words: "a potential risk where a misaligned model could influence the very infrastructure designed to measure its capabilities." Wild!
English
5
12
167
23.2K
David Atkinson nag-retweet
Toby Ord
Toby Ord@tobyordoxford·
Some great new analysis by @gushamilton shows that AI agents *don't* obey a constant hazard rate / half-life. Instead they all have a declining hazard rate as the task goes on. 🧵 x.com/gushamilton/st…
Gus Hamilton@gushamilton

I had a think about the @METR_Evals time horizon evals recently, and think there might be some benefit in using a more nuanced approach to modelling agentic time. In particular, I think we can use a SURVIVAL (Weibull) model to understand why agents fail and when +

English
4
16
94
12.7K
Cas (Stephen Casper)
Cas (Stephen Casper)@StephenLCasper·
@GoodfireAI, I think this hype-milling verges on dishonesty. I believe that this paper has the potential to do big disservice to its readers, particularly less experienced ones who are newer to interp. Nothing new was accomplished here, and it wasn’t done in a useful way. This project just used interpretability methods as a circuitous way of contriving the rediscovery of predictive features in data sets, like sequence length. This project validated its interpretations about the salience of features by validating them as predictive features within a test set. But if that is what we treat as the ground truth, there’s no point to the use of interp tools. This is not a proof of concept for a repeatable recipe for scientific discovery as the post and thread claim. In order to show that these tools are valuable, you need to show that you can use them to discover something that wouldn’t be trivial to discover just by looking at the datasets. In the past few years, several papers have demoed this kind of thing. But this paper is not one of them. When you limit yourself to a hammer, everything looks like a nail. Especially when you’re also selling that hammer. In 2023, I told the GoodFire founder that I think a venture-capital-backed, for-profit interpretability research startup was the last thing that the epistemics of the interpretability community needs. I think this is still true and that GoodFire is establishing a pattern of grift.
Goodfire@GoodfireAI

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

English
10
2
158
21.4K
David Atkinson nag-retweet
Jacob Hilton
Jacob Hilton@JacobHHilton·
A challenge to the mechanistic interpretability community: fully interpret our 432-parameter RNN. (Thread)
English
15
36
559
64.1K