David Atkinson
222 posts

David Atkinson
@diatkinson
PhD student @Northeastern's Bau Lab. Working on AI interpretability. Previously @EpochAIResearch.



There’s a lot of great work on AI-assisted deliberation, and I think that is genuinely important. It can be useful in small day-to-day matters, like a low-stakes dispute between friends, but also in wider democratic debates, for example in the spirit of Taiwan’s Polis system. But in the latter case, a basic problem is that many people do not want to participate actively in civic debate. Your local authority’s communal meeting is subject to strong selection effects, so the deliberation taking place there is often not especially representative. One appealing feature of advocate agents and related ideas is that they could allow my interests to be represented in these fora without requiring me to invest substantial time myself. The relevant counterfactual is often not direct personal participation, but my interests simply being ignored. So the research agenda is not only about ensuring that deliberation is high-quality. It is also about: (a) evaluating how accurately an agent represents a principal’s views or values, which the principal may not themselves know fully ex ante; and (b) studying where delegation is appropriate, and where it is not. For (a), representation cannot just mean replaying a set of pre-existing stated preferences. In many domains, the principal does not have a fully formed view prior to engagement. That creates a tension: if the agent is too literal, it becomes a brittle puppet; if it is too interpretive, it ceases to be a representative and becomes a co-author or governor. For (b), the question is not only who speaks, but what kind of system should interpret and weight what is said. This connects back to classic debates about democracy: many modern democracies may produce better long-run outcomes if a somewhat larger share of important decisions is insulated from short-term mass electoral pressure. As delegation to agents increases, there will therefore be a balance to strike between: (a) the advocate agent faithfully representing a political view or set of values; and (b) the way the receiving institution or agent processes that representation. Today this is a messy system intermediated by humans, who are often corruptible, swayed by power, or simply not very good at the task. One advantage of an agent-mediated system is that at least some of these rules and dynamics become more explicit and, in principle, more verifiable. Future advocate agents could therefore offer a better form of proxy representation than the current mix of ad hoc human intermediaries, provided that we can evaluate fidelity of representation and specify legitimate downstream processing rules. At least in theory...








The question of LLM consciousness is a truly gnarly Gettier problem, because if they are conscious it is for reasons entirely independent of the fact that they talk about it.

AI training compute efficiency has improved extremely fast: each year, you need several times less training compute to reach the same capability. But AI architectures/algorithms haven’t changed *that* much in recent years. So where do these efficiency improvements come from? 🧵

How do protein folding models turn sequence into structure? In "Mechanisms of AI Protein Folding in ESMFold", we find properties like charge and distance encoded in interpretable, steerable directions. The trunk processes features in two phases: chemistry first, then geometry.

I had a think about the @METR_Evals time horizon evals recently, and think there might be some benefit in using a more nuanced approach to modelling agentic time. In particular, I think we can use a SURVIVAL (Weibull) model to understand why agents fail and when +

@daniel_271828 @mrgunn @DouthatNYT Who wore it better





We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)













