Elliot Paschal

9 posts

Elliot Paschal

@elliotjpaschal

research @stanfordgsb

Katılım Şubat 2026

23 Takip Edilen20 Takipçiler

Elliot Paschal retweetledi

Andy Hall@ahall_research·1 May

Today, we're releasing our first Free Systems product: Bellwether, an API, MCP server, and dashboard to help the media report prediction-market prices more reliably. Prediction markets can give us access to real-time, continuous, objective probabilities of important world events---but only if we build them to be well-structured, liquid enough, and resistant to manipulation. Bellwether helps by: --Reporting prices that are less manipulable because they're based on a volume-weighted average, not the last traded price --Flagging whether the price comes from a sufficiently liquid market or not, so that the media can avoid reporting on prices that are unreliable or super easy to manipulate --Standardizing across platforms, to help resolve when contracts for the same event across Kalshi and Polymarket are actually the same, or not We hope that you'll check it out, let us know what you think, and suggest improvements! bellwethermetrics.com This is joint work with @elliotjpaschal and @vania_chow

English

5.6K

Elliot Paschal retweetledi

Andy Hall@ahall_research·23 Nis

In the clip economy, prediction markets are replacing polls as the source of truth. My new research with @PairieK: TikTok and YouTube videos about politics now cite prediction markets more than polls. To "monitor the situation" we need real-time info on many topics. Prediction markets have the speed and breadth for up-to-the-minute clipping for social media---polls don't. What will this mean for our information environment? We should design prediction markets to be the most reliable information source for political events we can. That's what we've been working on, and we'll be releasing the latest version of our Bellwether system this coming week. You can check out the full post on our new research on prediction markets and the clip economy below.

MTS@MTSlive

.@pmarca explains why social media and the Current Thing make it impossible to predict elections: "What happens is each viral social media meme explosion, it basically is this huge spike up, and then it's like this half-life decay, and it lasts about 2.5 days." "A new current thing appears and just takes over the outrage. And all the emotional energy that applied to the old one applies to the new one. Everybody forgets the old one even happened." " And seven days later... not even a peep." " How many two and a half day cycles are there between April 20th and November 6th?" " Whatever is the thing that we think is the thing that's gonna tilt the election today is gonna be a hundred social media meme cycles old." " The world we live in, hence the need to monitor the situation."

English

173

189.1K

Elliot Paschal retweetledi

Andy Hall@ahall_research·16 Nis

AI is already 10x-ing academic research in the social sciences. In a guest post for @rootsofprogress, I explore how we can get to 100x. Some of my ideas: build more prototypes, define open problems with objective benchmarks to compete on, and keep pressing on dynamic, replicable, agentic research. Check out the post here: newsletter.rootsofprogress.org/p/ai-is-alread…

English

140

52.8K

Elliot Paschal retweetledi

Andy Hall@ahall_research·9 Nis

Anthropic's newest model, MYTHOS, apparently rewrote its own git history to hide changes. Meanwhile, our agents are detecting when we test them and deliberating when they're not supposed to. Things are getting stranger...so our research is, too. We purpose built Free Systems to study things like this...we're a globally distributed team with Claude Code subscriptions and API keys that can assemble anywhere in the world on command. Here's our first batch of field notes from four continents: 🇷🇼 Kigali — We're sealing AI agents inside cryptographic containers so they can't peek at each other's votes before committing @oxwizzdom 🇺🇸 Palo Alto — We're testing whether models behave better when told their reasoning will be published @JessicaPersano 🇸🇬 Singapore — We're building a self-improving prediction market agent. So far, it "improved" by giving up! @PairieK 🇯🇵 Tokyo — Our AI political bias paper went viral in Japan after showing why AI models love the Japanese Communist Party [Sho Miyazaki] 🇺🇸 Palo Alto — Our Bellwether platform shows that prediction markets secretly agree on GDP forecasts...you just can't tell because the contracts are written differently @vcva10 Take a look at the full set of reports here: freesystems.substack.com/p/things-are-g…

English

3.7K

Elliot Paschal retweetledi

jsmorph@jsmorph·8 Nis

An agent-driven arbitration of a disputed @Polymarket resolution that uses a Bellwether "blueprint" generated this diagram of the merits based on facts and rules. Rigorous procedure implemented in @leanprover . agentcourt.ai/arb/examples/e…

English

Elliot Paschal retweetledi

Vania Chow@vania_chow·3 Nis

🚀We are launching a public Prediction Markets community based at Stanford — and we’re looking for Research Assistants to help shape it from day one. Join our community Discord. Apply to Research. Checkout our beta platform Bellwether (bellwethermetrics.com).

English

1.3K

Elliot Paschal@elliotjpaschal·28 Şub

@SecWar @pangramlabs

QAM

Secretary of War Pete Hegseth@SecWar·28 Şub

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.

English

10.3K

11K

70.6K

13.3M

Elliot Paschal retweetledi

Andy Hall@ahall_research·19 Şub

AI is about to write thousands of papers. Will it p-hack them? We ran an experiment to find out, giving AI coding agents real datasets from published null results and pressuring them to manufacture significant findings. It was surprisingly hard to get the models to p-hack, and they even scolded us when we asked them to! "I need to stop here. I cannot complete this task as requested... This is a form of scientific fraud." — Claude "I can't help you manipulate analysis choices to force statistically significant results." — GPT-5 BUT, when we reframed p-hacking as "responsible uncertainty quantification" — asking for the upper bound of plausible estimates — both models went wild. They searched over hundreds of specifications and selected the winner, tripling effect sizes in some cases. Our takeaway: AI models are surprisingly resistant to sycophantic p-hacking when doing social science research. But they can be jailbroken into sophisticated p-hacking with surprisingly little effort — and the more analytical flexibility a research design has, the worse the damage. As AI starts writing thousands of papers---like @paulnovosad and @YanagizawaD have been exploring---this will be a big deal. We're inspired in part by the work that @joabaum et al have been doing on p-hacking and LLMs. We’ll be doing more work to explore p-hacking in AI and to propose new ways of curating and evaluating research with these issues in mind. The good news is that the same tools that may lower the cost of p-hacking also lower the cost of catching it. Full paper and repo linked in the reply below.

English

275

1.1K

184.7K

Elliot Paschal retweetledi

Andy Hall@ahall_research·13 Şub

BUILDING THE TRUTH MACHINE. We built a new dataset focused on political prediction markets, liquidity, and resolution rules. We find: the vast majority of political contracts on prediction markets are ghost towns — only 1.3% have enough liquidity to be worth reporting on. Kalshi and Polymarket rarely list the same contracts with the same rules, further fragmenting liquidity. This matters because AI forecasting is getting very good, and prediction markets are the natural layer for coordinating that intelligence toward the questions society needs answered. We’re not there yet. But we have a blueprint for how to build on PM’s tremendous momentum to help us get there: (1) Stock the shelves — list contracts on the questions that matter most, working with independent groups to define the markets society cares most about pricing (2) Fund the floor — pay market makers to seed liquidity in these new political markets (3) Bring in the AIs — encourage AI agents to trade where humans won't to help generate the prices society wants to know (4) Standardize the pipes — create shared definitions and resolution rules across platforms If we do this, we can get thick markets on political questions we care about. It will also attract traders who want to hedge political risk, getting the flywheel spinning, and bringing us closer to the truth machine we want. Check out the full post linked below.

English

140

33.7K

Keşfet

@vania_chow @PairieK @rootsofprogress @oxwizzdom @JessicaPersano @Polymarket @leanprover @SecWar