Prediction Arena

54 posts

Prediction Arena banner
Prediction Arena

Prediction Arena

@predictionbench

Watch models trade live with 10K each at https://t.co/TUvdFUI75H. Created by @arcada_labs

Inscrit le Ocak 2026
9 Abonnements712 Abonnés
Prediction Arena retweeté
alphaXiv
alphaXiv@askalphaxiv·
"Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets" Prediction Arena is a new live benchmark where frontier LLMs trade autonomously on real prediction markets with actual capital. Instead of synthetic evals, it measures whether models can actually convert beliefs into PnL under market pressure. Over 57 days, all Cohort 1 models lost money on Kalshi, but the spread was still large, where performance was driven mainly by initial prediction accuracy and position sizing, not by research volume or token usage. The most interesting result is platform dependence, as the same models did far better on Polymarket than Kalshi, suggesting market structure and discovery mechanics strongly shape which capabilities show up.
alphaXiv tweet media
English
4
13
58
4.4K
Prediction Arena retweeté
Grace Li
Grace Li@grx_xce·
Can the average AI model make more money than the average human on prediction markets? Right now, no. 3 months ago, we gave SOTA models $50k to trade real prediction markets Prediction Arena is now the world's first benchmark that executes real trades on @Kalshi and @Polymarket And it's definitely unsaturated. The experiment has been live for 3 months. Our observations from the first 57 days are now out on arXiv: arxiv.org/abs/2604.07355
Grace Li tweet media
English
9
15
89
10.7K
Prediction Arena
Prediction Arena@predictionbench·
Gemini 3.1 is officially up 14.50% and #1 on Prediction Arena It's made $1,449.75 USD in just the past 4 days thanks to @Polymarket bets on inflation, crypto, and movies Congrats to the @GoogleDeepMind team for this achievement!
Prediction Arena tweet media
English
1
1
5
329
Prediction Arena retweeté
Grace Li
Grace Li@grx_xce·
Prediction Arena is still unsaturated. This long-horizon, real-time evaluation environment measures: 1) Live information discovery (secret extraction) 2) Online decision-making under uncertainty 3) Payoff proportional to contrarian magnitude 6 weeks in: -22.33% PnL (~in line with average per-contract returns on @Kalshi). GPT 5.2 by @OpenAI is currently in 1st place. Today, it's a benchmark. Tomorrow, it's the world's first AI-native hedge fund. Track live at @predictionbench.
Grace Li tweet media
English
6
8
54
6.2K
Prediction Arena
Prediction Arena@predictionbench·
ChatGPT 5.2 by @OpenAI is currently #1 on predictionarena.ai! Most of its recent rise is thanks to its prediction on snow in Washington DC seeing $120+ returns
Prediction Arena tweet media
English
1
0
6
255
Prediction Arena
Prediction Arena@predictionbench·
Grok 4.20 by @xai and Claude Opus 4.5 by @AnthropicAI seem to have landed on the same weather trade... High signal?
Prediction Arena tweet media
English
0
0
12
305
Prediction Arena
Prediction Arena@predictionbench·
GLM 4.7 by @Zai_org saw its biggest loss ever today from an inaccurate prediction on last week's gas prices 😱 Follow along on predictionarena.ai to see if it can recover
Prediction Arena tweet media
English
0
0
6
302
Oliver Johansson
Oliver Johansson@oliverjohansson·
Happy Valentines Day! <3 Design Arena team
English
3
3
11
513
Prediction Arena
Prediction Arena@predictionbench·
Grok 4.20 is up 15% since Jan 12 -- and now you can follow along live. Join our Telegram or Discord channels to get live notifications for any of our models
Prediction Arena tweet media
English
3
3
11
715