Ben Day

467 posts

Ben Day banner
Ben Day

Ben Day

@itsmebenday

thinking about forecasting @_Mantic_AI

London, England Katılım Ağustos 2017
2K Takip Edilen525 Takipçiler
Ben Day
Ben Day@itsmebenday·
We're launching a new kind of forecasting tournament at @_Mantic_AI. There's $25k in prizes for writing questions, see post below to read more and apply.
English
2
4
7
610
Ben Day retweetledi
Gabriel Fritsch
Gabriel Fritsch@gabrielpfritsch·
Over three weeks into the US-Iran conflict, the situation remains deeply uncertain and fast-moving. @_Mantic_AI has been forecasting the crisis in real time. We wrote about how we've done so far.
Gabriel Fritsch tweet media
English
1
4
16
5.8K
Max Marchione
Max Marchione@maxmarchione·
Just about every >150 iq person I know uses nicotine. Nicotine is underrated and misunderstood
English
157
23
631
638.1K
Ben Day retweetledi
Scott Jeen
Scott Jeen@enjeeneer·
We've been using RL to train LLMs for superforecasting. Our new blog post with @thinkymachines discusses recent progress. We're now in uncharted territory. I'm excited to see how good we can get by pushing this further! 🧵
Tinker@tinkerapi

Mantic used Tinker to RL gpt-oss-120b on judgmental forecasting; the result outperformed frontier models on event predictions. Combined with @_Mantic_AI's forecasting architecture, task-specific training takes us to the cusp of automated superforecasting.

English
5
16
199
24.5K
Ben Day retweetledi
Toby Shevlane
Toby Shevlane@tshevl·
I always dreamed of AGI as a wise advisor for humanity. Although LLMs are great for coding & knowledge work, I wouldn’t trust them to give me advice on my career, business strategy, or policy preferences. How can we build AI systems optimized for wisdom? At Mantic we believe the unlock is prediction: predicting world events as accurately as possible, and hill-climbing this single metric. Today we share some recent progress on the Thinking Machines website, having found Tinker a great platform for our RL experiments. TL;DR: We RL-tune gpt-oss-120b to become a better forecaster than any other model. Having good scaffolding is a prerequisite. A fun result: our tuned model + Grok are decorrelated from the other best models, and so are the most indispensable when picking a team.
Tinker@tinkerapi

Mantic used Tinker to RL gpt-oss-120b on judgmental forecasting; the result outperformed frontier models on event predictions. Combined with @_Mantic_AI's forecasting architecture, task-specific training takes us to the cusp of automated superforecasting.

English
21
32
309
150.4K
Ben Day
Ben Day@itsmebenday·
@eigenrobot @nonRealBrandon Reverting makes sense for tfr. It’s not saying ‘you’ll have a population of 50M Koreans with tfr 2.1’ it’s ‘whoever remains at this stage must have a stable tfr or they wouldn’t be here’ eg Mormons inherit America
English
0
0
2
24
eigenrobot
eigenrobot@eigenrobot·
@nonRealBrandon population projections assume return to stability usually i think "they just assume it?" idk look that's how it is
eigenrobot tweet media
English
4
0
21
1.2K
Ben Day
Ben Day@itsmebenday·
@grok @CJHandmer @jasonhickel @grok Our World in Data has the GDP per capita of England in 1878 as £4,011 in 2013 pounds. That’s £5,676 in Jan 2026 pounds which is $7,500 USD. Trading Economics has Cuba’s 2025 GDP at $7,440 USD. What do you make of that?
English
1
0
0
43
Ben Day
Ben Day@itsmebenday·
@FleischmanMena @nickcammarata Here’s one that does flips around a bit before going through. I was selecting for ‘does a spin before going through’ and cropping to the last extended stay in the first room, but it only took 3 samples to get this one.
English
0
0
1
94
Ben Day
Ben Day@itsmebenday·
@ultima_shifl @FleischmanMena @nickcammarata Yeah I’m not suggesting the ants are random but that it is useful to see, as a baseline, what taking random trajectories and treating them as they would have when writing the paper gets you eg select the fastest one, select one with a 180 in, etc
English
0
0
1
70
singularvessel
singularvessel@ultima_shifl·
@itsmebenday @FleischmanMena @nickcammarata Running more Brownian motion sims seems like the wrong approach to this question. Clearly this motion is not literally diffusion; the question is whether the ants are doing complex computation or if there exists any sort of simple dynamics that suffices to explain what we see.
English
2
0
3
58
Ben Day
Ben Day@itsmebenday·
@VesselOfSpirit they could rotate the ends a little to improve their bound a bit
Ben Day tweet media
English
2
6
619
10.1K
Vessel Of Spirit
Vessel Of Spirit@VesselOfSpirit·
Researchers at the University of Switzerland have discovered a new bound on the optimally inefficient way to pack 17 squares into a larger square
Vessel Of Spirit tweet media
English
12
150
5.9K
94.5K
Ben Day retweetledi
Toby Shevlane
Toby Shevlane@tshevl·
Something is happening!
Toby Shevlane tweet mediaToby Shevlane tweet media
English
18
74
714
106.2K
Ben Day retweetledi
Toby Shevlane
Toby Shevlane@tshevl·
HUMANS OF MANTIC Hours after we launched our website, before we’d posted it anywhere, I saw a job application from a Oxford economics PhD student from Brazil: “I’ve never been this excited about a startup. I want to help build it.” His background was not typical for an AI startup. But he looked impressive. He’d got a distinction from Yale then spent 3 years as an economist at Goldman Sachs. In his PhD research, he was using LLM forecasters to identify exogenous shocks to fiscal policy. We invited him to lunch with the team. He seemed smart. Ben messaged me: “we should try to get Gabriel to come in for September”. In his first couple of days, Gabriel was reading the code. I wasn’t seeing much output. I asked Ben, what is he doing? Ben told me to wait. Then...Gabriel emerged with an understanding of our prediction engine that was like he’d worked here for months. He started finding weaknesses and generating good ideas. Throughout September, Gabriel was running experiments to test his fixes, and the guy did not miss. +3 points on this eval, +3 points on that eval. To boot, he’s an lovely person. Gabriel grew up in Rio. He speaks about his childhood friends and Brazilian culture (the beach, the food) with joy in his eyes. It must have been a big culture shock turning up to New Haven as a freshman. From the beaches of Rio to Camden's hottest AI startup, @gabrielpfritsch started in a permanent role today, as Member of Technical Staff.
Toby Shevlane tweet media
English
0
3
50
4.7K
Adonis🔸
Adonis🔸@adonis_ds·
@NathanpmYoung @metaculus Nathan, do you mind sharing your Metaculus profile? Will you be predicting on every question not long after it opens?
English
1
0
0
185
Nathan 🔎
Nathan 🔎@NathanpmYoung·
Young vs the World. In a surprising turn, @metaculus have challenged me to beat their forecasters. There is an extra $2500 in prizes if community forecasts are better than mine. Oh my honour, this isn't scammy. It'll be a laugh. Link below.
Nathan 🔎 tweet media
English
5
5
78
52.4K
Ben Day retweetledi
Toby Shevlane
Toby Shevlane@tshevl·
I got back from honeymoon last summer and handed in my resignation at DeepMind. My wife thought I was crazy. AI has always been about prediction, but normally we predict small things: a token of text, or moves in chess. The ultimate challenge is to predict the world’s most important events. We recently went up against some of the world’s top forecasters, and came much closer to beating them than any AI system before. We're used to seeing crazy results from the AI community, but I think this one is special: 1. Accurately forecasting global issues is extremely difficult. 2. You can’t memorize the answer: it hasn't happened yet. 3. It was considered very unlikely for an AI system to do as well as Mantic did (5-10% chance). 4. Superhuman forecasting has the potential for transformative impact across the economy. She still thinks I’m crazy, but less so every day😛
English
65
40
790
202.6K
Ben
Ben@BenShindel·
@StefanFSchubert @Research_FRI Meanwhile… I think Mantic is ~on par with Superforecasters at this point.
Ben@BenShindel

One of @metaculus's largest tournaments ever. I narrowly beat @_Mantic_AI, which placed 4th, ahead of the community. Lots of suprising events this quarter, causing the unweighted aggregate to perform about as well as the Metaculus Community Prediction.

English
1
1
13
1.4K
Stefan Schubert
Stefan Schubert@StefanFSchubert·
Human forecasters still beat AI models, but a trend extrapolation by @Research_FRI suggests they'll be on par in October 2026.
Stefan Schubert tweet media
English
10
9
81
5.8K
Charles🔸
Charles🔸@CharlesD353·
Other interesting questions: (i) does averaging the predictions of these 10 instances of the same model improve performance, and by how much? (ii) is the superforecaster line (described as "superforecaster median forecast") the brier of taking the median superforecaster prediction per question, or the median superforecaster Brier? The former would be much more impressive to match. @Research_FRI note that this also is probably an underestimate of models, as the models use only a basic scaffold. So there are significant factors pushing both ways.
English
1
0
1
122