✨🔴_🔴✨ Ben Jones (@ben_chain) - Twitter Profili

Sabitlenmiş Tweet

Ethereum is my love and life but I couldn't resist poking some satirical fun today. If ETH was a Disney musical villain:

English

141

244

1.4K

0

✨🔴_🔴✨ Ben Jones retweetledi

Optimism@Optimism·11 Mar

x.com/i/article/2031…

ZXX

17

16

130

8.6K

✨🔴_🔴✨ Ben Jones retweetledi

Optimist Prime@jinglejamOP·17 Şub

Our Official Response to FUD and Drama. You're gonna see us on a rampage this year. Of course, the most delightful things to engage with on twitter are drama and FUD posts. So reasonably, I’ve been getting questions about what various trends mean for Optimism. Whether it's regarding thought leader commentary, other L1s, or poorly disguised forks of the OP Stack. Let me be clear: External commentary doesn't change our roadmap. It doesn't modify the history of what we've already accomplished. It doesn't change our priorities. We set our roadmap based on what our partners need. People choose the OP stack because it's the best technology, the most performant chain stack, and the most profit-enabling infrastructure. These claims are proven in production, not flashy headlines for technology that hasn't been built. What we're actually building The Scale The Superchain is by far the fastest growing blockchain stack in crypto. Two years ago, we processed 208 million transactions. Last year, 6.1 billion. That's 29x growth— by far the fastest of any blockchain ecosystem in crypto. The Economics 10.1x. For every dollar users spend on fees, chains generate ten dollars in revenue. The value flows to partners, not the platform. That's by design. The Model The OP Stack is open source. To differentiate in a competitive market, you need to own your infrastructure. But most enterprises don't want to build and maintain it themselves—they want to focus on their product and their users. That's the gap we fill. That's why we built OP Enterprise. Think of it like Databricks for Spark or Elastic for Elasticsearch: open source core, widely adopted, but enterprises need support, customization, and guaranteed service levels. OP Enterprise is that layer for blockchain infrastructure. The Partners World is building identity systems for 20 million users. Sony is building for creators and gaming. Uniswap is building the home for onchain trading. Kraken is bringing institutional-grade infrastructure to their users. Base has scaled to become one of the largest chains in crypto. Each chose us for specific strategic reasons—and those reasons haven't changed. The Future As the market matures, partners will evolve in different directions. Some will want more control. Some will build specialized systems for their unique requirements. That's expected. The Superchain was designed to be bigger than any single partner. We've been at this for four years. We're just getting started. This week at the Scaling Summit, we're showing the next generation: native interoperability, faster block times, enterprise-grade compliance controls. What if you could launch products in weeks, not years—and make more revenue for every dollar deposited? That's what we help you do.

English

33

19

124

36.2K

✨🔴_🔴✨ Ben Jones retweetledi

Greg Bresnitz@GregBresnitz·12 Şub

I’m joining @Optimism to lead narrative and strategy. You either die a hero or live long enough to see “storyteller” in your job title. Here we are. We’re at an inflection point most people are still missing. Regulatory clarity arrived. JPMorgan processes $2B daily on blockchain rails. BlackRock manages $2.5B in tokenized assets. The GENIUS Act gave banks a real pathway in. The question shifted from “if institutions come onchain” to “through whom.” Everyone talks about bringing a billion users onchain. Here’s what I’ve come to believe: That doesn’t happen through consumer apps alone. It happens when the institutions those users already trust—their banks, their exchanges, their payment providers—build on infrastructure they can actually rely on. The billion users come through the enterprise door. And the enterprises coming through that door want three things: their own blockchain, their own revenue, and guarantees they can take to their board. Optimism has a front row seat for this. Coinbase built @base on the OP Stack. Sony’s @soneium. Kraken’s @inkonchain. Uniswap’s @unichain. @world_chain_. These aren’t experiments—they’re production systems processing over half of all Ethereum L2 transactions. When enterprises evaluate where to build, they’re choosing Optimism. I spent time understanding the architecture before saying yes. The OP Stack is MIT licensed and fully open. Stage 1 decentralization means permissionless fraud proofs and independent exit windows. When a CTO looks under the hood, they find something built by people who know what they’re doing. Technical credibility closes deals. I also believe in the people. @jinglejamOP, @karl_dot_tech, @ben_chain, Kevin, @tyneslol , and the team have been building toward an open internet for years—through bear markets, technical rewrites, and an industry that rewards hype over substance. They kept shipping. That dedication is rare, and it’s why the Superchain exists today. The next few years will be exciting. AI agents coordinating through programmable rails. Institutional capital flowing into infrastructure that doesn’t require trust in intermediaries. The technology receding into the background until it becomes invisible, essential, everywhere. Like electricity. Like the internet before it. Optimism has been building for years. Now it’s time to tell that story. Four years at @FWBtweets taught me how much narrative matters. Optimism built something real. It’s time to make sure people know. Let’s go.

English

31

4

197

18.4K

✨🔴_🔴✨ Ben Jones@ben_chain·29 Oca

True story: in 2018 when I was first getting into the ETH community, @VitalikButerin asked me if I wanted to work for the EF “selling plasma to banks”. I thought at the time I was saying no. Turns out I was saying “yes, just give the market 8 years to catch up with us.”

Optimist Prime@jinglejamOP

LISTEN UP EVERYBODY! Today we're launching OP Enterprise. We all know that crypto is at the cusp of major mainstream adoption. Nearly every major enterprise has a crypto strategy. We're the only team that has successfully launched chains for multiple companies. We've packaged up all of our learnings into a product offering that will onboard the next wave of enterprises. OP Enterprise is production-grade blockchain infrastructure for companies that want to build businesses, not become blockchain experts. 𝗬𝗼𝘂𝗿 𝗯𝗹𝗼𝗰𝗸𝗰𝗵𝗮𝗶𝗻. 𝗬𝗼𝘂𝗿 𝗿𝗲𝘃𝗲𝗻𝘂𝗲. 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗴𝘂𝗮𝗿𝗮𝗻𝘁𝗲𝗲𝘀. — 𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 Here's what we keep hearing from enterprises: They're building on infrastructure where the incentives don't align with their own. They're competing for mindshare, fighting to keep users, working against platform economics that extract value from everything they build. Most blockchain platforms don't care if you're successful. Their focus is on their own TVL and metrics - not yours. You launch your stablecoin into an environment that competes with everyone else's stablecoin and hemorrhage capital to onboard your users onto a blockchain you have zero control over. And even if you decide to own your chain, you hit the real bottleneck—onboarding the ecosystem partners you need to go live. Stablecoins, oracles, bridges, wallets, indexers. Each negotiation takes months. Costs hundreds of thousands to millions of dollars. Vendors pick off blockchain teams one by one. We've seen this movie before. Many times. — 𝗢𝗣 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗳𝗶𝘅𝗲𝘀 𝗯𝗼𝘁𝗵 𝗥𝗲𝘃𝗲𝗻𝘂𝗲 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 — When you own your chain, your infrastructure becomes a revenue-generating asset. Not a cost center. DeFi protocols deploy on your rails. The economic activity you enable accrues to you. This isn't about saving on fees. It's about owning the infrastructure layer where financial value is created. 𝗩𝗲𝗻𝗱𝗼𝗿 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲 — We've onboarded tier-one partners across 50+ production chains. They're already integrated, contracted, ready to deploy. We negotiate standard terms, manage costs down, and fast-track partnerships that would otherwise delay your launch by 6-12 months. We've done this work already. You don't have to. — 𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝗴𝗲𝘁 𝗙𝘂𝗹𝗹𝘆 𝗠𝗮𝗻𝗮𝗴𝗲𝗱: We run your chain end-to-end. 24/7 monitoring, incident response, security ops, upgrade orchestration. You focus on product. 𝗦𝗲𝗹𝗳 𝗠𝗮𝗻𝗮𝗴𝗲𝗱: You operate, we support. Architecture guidance, security assessments, priority patches, direct access to core engineers. 𝗢𝗣 𝗠𝗮𝗶𝗻𝗻𝗲𝘁: Start on our flagship public network with enterprise support. Graduate to your own chain when ready. Same codebase, seamless migration. First conversation to production: 8-12 weeks. — 𝗧𝗵𝗲 𝘀𝗽𝗲𝗰𝘀 99.99% uptime SLO 15-minute P1 incident response Up to 5B RPC requests/month with multi-provider redundancy 10 Mgas/sec baseline, 100+ Mgas/sec for high-volume applications Sub-200ms block times 20k requests-per-second burst capacity Stage 1 security with permissionless fault proofs Optional ZK fault proofs for faster finality — 𝗪𝗵𝘆 𝗻𝗼𝘄 The window for enterprise blockchain has shifted from "if" to "how fast." MiCA is live in Europe. US policy is stabilizing. The enterprises that spent 2023-2024 in exploratory mode are now greenlighting production builds. Enterprise deals are now a competitive space. When we talk to enterprises, we see everyone trying to help them come onchain. But the OP Stack is the only stack that has successfully brought and scaled multiple enterprises onchain. We've seen what works and what doesn't. We've earned this knowledge by building alongside the fastest-growing enterprises in web3. We've encountered every failure mode because we've been doing this longer than anyone else. At the end of the day, enterprises want to control their own economics. They don't want to rent infrastructure from platforms that compete with them. The OP Stack vision will win. Shared standards balanced with chain autonomy. — 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘂𝘀 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝘁𝗿𝗮𝗰𝗸 𝗿𝗲𝗰𝗼𝗿𝗱 — 50+ chains launched. Not pilots. Production systems serving millions. 𝗜𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻 𝘃𝗲𝗹𝗼𝗰𝗶𝘁𝘆 — We control the entire stack. When we discover vulnerabilities, customers get patches within hours. Not weeks. 𝗗𝗶𝗿𝗲𝗰𝘁 𝗮𝗰𝗰𝗲𝘀𝘀 𝘁𝗼 𝘁𝗵𝗲 𝘀𝗼𝘂𝗿𝗰𝗲 — Questions go to the engineers who wrote the code. Feature requests go to the people who can actually implement them. No translation layer. 𝗘𝘁𝗵𝗲𝗿𝗲𝘂𝗺 𝗰𝗿𝗲𝗱𝗶𝗯𝗶𝗹𝗶𝘁𝘆 — We've worked on Ethereum's core protocol. Defined its scaling roadmap. Invented the L2 architecture that powers the industry. 𝗡𝗼 𝘃𝗲𝗻𝗱𝗼𝗿 𝗹𝗼𝗰𝗸-𝗶𝗻 — Open source. Fork if you want. Most teams discover they'd rather work with us—but the choice is always yours. — Our first customers: Unichain — Uniswap needed their own chain. They chose us. Uniswap Labs operates Unichain with Mission-Critical Support—priority response for high-stakes moments where downtime isn't an option. Celo — Scaling mobile payments across Latin America and Africa. Millions of users. Celo operates their network with Mission-Critical Support, ensuring enterprise-grade backing in emerging markets. Different use cases. Same infrastructure. Same commitment to their success We are here today because partners like these rolled up their sleeves and built together with us. — 𝗪𝗵𝗼 𝘁𝗵𝗶𝘀 𝗶𝘀 𝗳𝗼𝗿 Fintechs building next-generation financial services Centralized exchanges launching tokenized products Payments companies building cross-border rails Financial institutions exploring tokenization and digital assets If you need infrastructure that performs without the operational burden—and you want to own your economics instead of renting them—OP Enterprise is for you. — OP Enterprise is a major focus for us in 2026. We have active engagements across fintech, exchanges, payments, and financial services. The direction is clear: the OP Stack is becoming the standard for the next generation of financial systems. This is the first of many announcements to come. If you're serious about building onchain, we should talk.

English

3

1

47

4.3K

✨🔴_🔴✨ Ben Jones retweetledi

Tevm@tevmtools·29 Oca

An ounce of robustness is worth a pound when it comes to LLM-driven development To get introduced point your llm at: Docs: voltaire-effect.tevm.sh LLMs.txt: voltaire-effect.tevm.sh/LLMS.txt MCP: voltaire-effect.tevm.sh/mcp Skill: npx degit evmts/voltaire/skills/voltaire-effect ~/.codex/skills/voltaire Or you can do what I do which is clone the entire repo Github: github.com/evmts/voltaire Ask your llm to give you a quick tour 3/25

English

2

1

9

1.5K

✨🔴_🔴✨ Ben Jones retweetledi

Optimist Prime@jinglejamOP·23 Oca

Two weeks ago I shared our proposal to align the OP token with Superchain growth. The governance vote is now live. Let's go!!!

English

9

7

47

46.5K

✨🔴_🔴✨ Ben Jones retweetledi

Andy Hall@ahall_research·21 Oca

Locking LLMs into blockchains as unbribeable, implacable judges could give us adjudication systems that are transparent, credibly neutral, and genuinely hard to game. Prediction markets are the natural test case, and if we get it right, the implications will extend to any setting where judgment calls are required. The core idea: at contract creation, you commit the exact model version and prompt on-chain. Everyone can inspect the full resolution mechanism before they trade. No rule changes mid-flight, no backroom negotiations, no discretionary judgment calls. Why this helps: --You can't bribe a model or flip its vote after the fact — the weights are fixed --The LLM has no financial stake in the outcome, so conflicts of interest disappear --The entire mechanism is auditable before anyone places a bet It’s not a magic solution: models make mistakes, prompt design matters enormously, and information sources can still be targeted by adversaries. But these problems may be more tractable than the ones we're stuck with now---human bias, opacity, and the ever-present temptation to game committee decisions. How to move forward: --Experiment on lower-stakes contracts to build a track record and discover failure modes --Standardize the approach as best practices emerge; help liquidity concentrate in markets with the most reliable LLM judges --Build transparency tools so traders can actually inspect the model, prompt, and sources before trading --Design ongoing human governance for meta-level decisions: which models to trust, when to update defaults, how to handle appeals, etc. My new piece for @a16zcrypto explores this in depth. With many thanks to @sreeramkannan @ben_chain @benfielding and many others thinking about this area. Link is below.

English

29

41

101

17.4K

✨🔴_🔴✨ Ben Jones@ben_chain·20 Oca

@jinglejamOP @lay2000lbs rust

English

0

3

68

Optimist Prime@jinglejamOP·20 Oca

@lay2000lbs set

1

0

2

262

✨🔴_🔴✨ Ben Jones retweetledi

Optimist Prime@jinglejamOP·20 Oca

Our 2026 Roadmap Last year, we made transactions 10x faster and 10x cheaper. But that’s not enough. This year, we also want to make the OP Stack hands down the best toolset for scaling to your first million dollars in onchain revenue. To us, scaling means identifying and removing bottlenecks. And the bottleneck isn’t always throughput. Sometimes, the bottleneck is liquidity. Sometimes it’s security, UX or integrations. We build tools to solve every flavor of bottleneck. That’s why OP Stack chains earn up to 3x more in profit (2x more revenue) per dollar onchain than the most successful chains on other L2 stacks. That’s why we process 14% of all transactions in crypto. And we do this all while maintaining the lowest fees. Our goal is to build infrastructure that enables companies to expand their product lines faster, and build superior user experiences. So in 2026, a majority of our product development will be focused on the following three pillars: Performance The most at-scale chains use the OP Stack. @base is the fastest and highest throughput L2 on the market - they will continue to push the limits of scaling in 2026, and the rest of the Superchain will benefit too. We’ll be allowing more compute supply for lower infrastructure costs while simultaneously handling more compute demand with the same supply through features like BALs to scale node operation, lightweight derivation, and fee market modularization. op-reth will also be available to any chain that wants it. Customizability The OP Stack is widely regarded as the most modular and customizable chain stack on the market. Notable examples include: @world_chain_'s Proof of Humanity, @Celo's native fee abstraction, and @unichain's provable blockbuilding. In 2026, we’ll add more customization levers like multiple gas tokens, priority lanes, and dynamic opcode repricing for improved execution. And alas - ZK. Revenue Your money already works harder for you on the OP Stack. We want to push that even further with native token actions such as buy & burn for any chain, MEV auctions (in partnership with Flashbots), sequencer revenue modules that enable different transaction ordering schemes, and features that enable more liquidity to move securely and quickly between chains to help reduce LTV ratios in defi apps on the Superchain. We're excited to see our partners succeed in 2026, and to deliver the roadmap that enables that. If you'd like to talk to us, come find us at our Scaling Summit in ETH Denver:

English

34

54

154

46.2K

✨🔴_🔴✨ Ben Jones@ben_chain·20 Oca

@jinglejamOP FML CHAT REALLY HAS RUINED THE EMDASH I SWEAR I USED IT ALL THE TIME BEFORE LLMS

English

0

4

145

Optimist Prime@jinglejamOP·20 Oca

@ben_chain this is some chatgpt written quote tweet ben xD

English

1

0

4

308

✨🔴_🔴✨ Ben Jones@ben_chain·20 Oca

2026 is the year of scaling real-world, fact-grounded—not arbitrary, hypothetical—performance metrics.

Optimist Prime@jinglejamOP

Our 2026 Roadmap Last year, we made transactions 10x faster and 10x cheaper. But that’s not enough. This year, we also want to make the OP Stack hands down the best toolset for scaling to your first million dollars in onchain revenue. To us, scaling means identifying and removing bottlenecks. And the bottleneck isn’t always throughput. Sometimes, the bottleneck is liquidity. Sometimes it’s security, UX or integrations. We build tools to solve every flavor of bottleneck. That’s why OP Stack chains earn up to 3x more in profit (2x more revenue) per dollar onchain than the most successful chains on other L2 stacks. That’s why we process 14% of all transactions in crypto. And we do this all while maintaining the lowest fees. Our goal is to build infrastructure that enables companies to expand their product lines faster, and build superior user experiences. So in 2026, a majority of our product development will be focused on the following three pillars: Performance The most at-scale chains use the OP Stack. @base is the fastest and highest throughput L2 on the market - they will continue to push the limits of scaling in 2026, and the rest of the Superchain will benefit too. We’ll be allowing more compute supply for lower infrastructure costs while simultaneously handling more compute demand with the same supply through features like BALs to scale node operation, lightweight derivation, and fee market modularization. op-reth will also be available to any chain that wants it. Customizability The OP Stack is widely regarded as the most modular and customizable chain stack on the market. Notable examples include: @world_chain_'s Proof of Humanity, @Celo's native fee abstraction, and @unichain's provable blockbuilding. In 2026, we’ll add more customization levers like multiple gas tokens, priority lanes, and dynamic opcode repricing for improved execution. And alas - ZK. Revenue Your money already works harder for you on the OP Stack. We want to push that even further with native token actions such as buy & burn for any chain, MEV auctions (in partnership with Flashbots), sequencer revenue modules that enable different transaction ordering schemes, and features that enable more liquidity to move securely and quickly between chains to help reduce LTV ratios in defi apps on the Superchain. We're excited to see our partners succeed in 2026, and to deliver the roadmap that enables that. If you'd like to talk to us, come find us at our Scaling Summit in ETH Denver:

English

5

0

12

951

✨🔴_🔴✨ Ben Jones retweetledi

Andy Hall@ahall_research·17 Oca

I've gotten a lot of really cool and thought provoking outreach since I started thinking outloud about some of the ways AI might change (and hopefully, improve) research. One of the ideas floating around, which @sreeramkannan and @ben_chain brought up, was that LLMs could evaluate how well pre-analysis plans were followed in the ultimate paper. It turns out someone has already been working on that! Jamie Cummins sent me his fascinating tool called RegCheck which produces full reports (see below) on how closely the plan is followed. Super cool!

Andy Hall@ahall_research

Since I extended my own research using AI, I've been thinking about how it's going to reshape research and universities. We can now build new institutions where research is continuously updated, automatically verified, and carried out at immensely greater scale. Picture a research institute where senior scholars direct dozens or even hundreds of AI agents on coordinated programs. Small teams providing questions and judgment while agents handle collection, analysis, and verification. What would it take to build? The requirements are almost comically simple: (1) compute funding for researchers, and (2) a commitment to hire ambitious people and get out of their way. This new institute can unlock totally new way to do research: --Living research that automatically updates any time new data arrives, so our knowledge stays up to date --Automatically verified research that we know replicates from the moment it's posted publicly --Hyperscaled descriptive work that ingests enormous bodies of political data, like the entire history of changes to the US tax code or every bill introduced in every state legislature --Prototypes for new governance tools that are built for communities and then tested alongside them I think we're stepping into a crazy new era of how social science is done. I offer more thoughts on what's changing and how we might design an AI-first university of the future in my post, linked below.

English

3

12

90

11.9K

✨🔴_🔴✨ Ben Jones retweetledi

Optimism@Optimism·15 Oca

The Optimism Scaling Summit 🔗 RSVP here: luma.com/ufx1y9z8 More details soon.

English

8

20

196

19.7K

✨🔴_🔴✨ Ben Jones retweetledi

OP Labs@OPLabsPBC·13 Oca

Welcoming Fadi Gebara as SVP of Engineering at OP Labs. Fadi brings decades of senior engineering leadership across cloud, payments, and distributed systems—most recently at Elastic, with prior roles at Meta, State Street, and IBM. He’s led global teams operating mission-critical infrastructure, with a long-standing focus on reliability, security, efficiency, and performance at scale. At OP Labs, Fadi will lead engineering efforts that advance the @Optimism protocol, ensuring the network continues to meet the demands of enterprise-grade, onchain workloads.

English

10

16

122

13.3K

✨🔴_🔴✨ Ben Jones@ben_chain·13 Oca

My hobby: gaslighting Claude into thinking there was a network error when in reality I just want to be able to return to the current point in case it fucks up

English

0

1

5

464

✨🔴_🔴✨ Ben Jones@ben_chain·9 Oca

what if loops themselves are the real mistake

✨🔴_🔴✨ Ben Jones@ben_chain

what if main loops are the real mistake

English

2

0

6

440

✨🔴_🔴✨ Ben Jones@ben_chain·9 Oca

Tune in for a fun one Monday!

Optimism@Optimism

Join us Monday, Jan 12 for a Space with Optimism founders @jinglejamOP, @ben_chain, and @karl_dot_tech to discuss the token buyback governance proposal and the next chapter of Optimism. x.com/i/spaces/1lPKq…

English

1

7

452

✨🔴_🔴✨ Ben Jones@ben_chain·9 Oca

it’s-just-autocorrectoors in shambles

Andy Hall@ahall_research

Last weekend I posted that Claude Code created a full empirical polisci study in an hour. A lot of people asked: but how accurate was the study? The answer: quite accurate, with some interesting mistakes and important limitations. To get the answer, Graham Straus kindly offered to do an independent, manual audit—collecting the same data and extending the paper like Claude did, but without using any AI. Here’s what he found: Claude replicated the original paper exactly, coded 29/30 CA counties correctly on treatment timing, and collected election data that correlated >.999 with manual collection. The three main errors Graham found—mis-coding one county’s treatment year, omitting data collection for several potentially relevant races in always-treated states, and not using non-presidential elections to compute turnout—are similar to the kinds of mistakes a human might make on a first pass at writing this paper, and had only small effects on the subsequent estimates. On the other hand, when Claude tried to create new analyses that weren’t straightforward extensions of the original paper, it did worse. No hallucinations or crazy errors, per se, but it drifted from the prompt and produced results we found to be poorly conceived. My read: –AI today is already an extremely powerful way to rapidly update and extend well-contained, simple empirical papers. –To do empirical social science research well, it absolutely needs guidance and oversight from human experts. We’ll be sharing broader thoughts on this work, what we learned by doing it, and where we go from here next week on my blog. Thank you to the many, many people who reached out, asked questions, and offered feedback on this project.

English

1

0

3

873

✨🔴_🔴✨ Ben Jones@ben_chain·9 Oca

@ahall_research @sreeramkannan @joshgans @alexolegimas @deanwball I also would love to see this for paper pre-registration. Presumably in that case the AI doesn’t even need to replicate the result directly to verify that the paper does the same analysis that what was pre-registered.

English

0

3

2.5K

✨🔴_🔴✨ Ben Jones retweetledi

Andy Hall@ahall_research·9 Oca

I was thinking the same thing! And @ben_chain proposed a similar idea when I last spoke to him about it. Based on my experience here, I think it's a super promising approach. At a minimum new submissions should provide their code, and the AI agent can check it for reproducability (i.e. do the results in the paper follow from the code correctly). We'd want to be cautious about potential mistakes but it would definitely provide a lot of signal. My guess is at current capabilities it would be useful, but less good, at doing a deeper replication in which researcher methods and choices are interrogated and judged, partially subjectively. That's going to require more work and thinking but it's a promising path for the future. My guess is at first this could be done with simple social trust, but as the stakes rise (and imagine for something like FDA medical trials), the verifiability you propose becomes essential.

English

2

1

5

662

Andy Hall@ahall_research·9 Oca

Last weekend I posted that Claude Code created a full empirical polisci study in an hour. A lot of people asked: but how accurate was the study? The answer: quite accurate, with some interesting mistakes and important limitations. To get the answer, Graham Straus kindly offered to do an independent, manual audit—collecting the same data and extending the paper like Claude did, but without using any AI. Here’s what he found: Claude replicated the original paper exactly, coded 29/30 CA counties correctly on treatment timing, and collected election data that correlated >.999 with manual collection. The three main errors Graham found—mis-coding one county’s treatment year, omitting data collection for several potentially relevant races in always-treated states, and not using non-presidential elections to compute turnout—are similar to the kinds of mistakes a human might make on a first pass at writing this paper, and had only small effects on the subsequent estimates. On the other hand, when Claude tried to create new analyses that weren’t straightforward extensions of the original paper, it did worse. No hallucinations or crazy errors, per se, but it drifted from the prompt and produced results we found to be poorly conceived. My read: –AI today is already an extremely powerful way to rapidly update and extend well-contained, simple empirical papers. –To do empirical social science research well, it absolutely needs guidance and oversight from human experts. We’ll be sharing broader thoughts on this work, what we learned by doing it, and where we go from here next week on my blog. Thank you to the many, many people who reached out, asked questions, and offered feedback on this project.

Andy Hall@ahall_research

Here's proof that Claude Code can write an entire empirical polisci paper. To validate my claim that AI agents are coming for polisci "like a freight train", today I had Claude Code fully replicate and extend an old paper of mine estimating the effect of universal vote-by-mail on turnout and election outcome...essentially in one shot. After careful prompting, Claude Code: (1) Downloaded the old paper's repo and replicated the past results, translating our old Stata Code into Python (2) Crawled the web to get updated official election data and census data (3) Ran new analyses extending the results through 2024 (4) Created new tables and figures (5) Performed a lit review (6) Wrote a wholly new paper (7) Pushed the whole thing to a new github repo The whole thing took about an hour. This is an insane paradigm shift in how empirical work is done. It also validates the point that several people including @BrendanNyhan made yesterday---it's going to be especially easy to scale observational research with AI. Thanks to @alexolegimas, @arthur_spirling , and many others who gave me feedback. .

English

9

53

340

110.5K

✨🔴_🔴✨ Ben Jones

Keşfet