Ben Day

467 posts

Ben Day

@itsmebenday

thinking about forecasting @_Mantic_AI

London, England Katılım Ağustos 2017

2K Takip Edilen525 Takipçiler

Ben Day@itsmebenday·3 Nis

blog.mantic.com/p/a-new-kind-o…

ZXX

Ben Day@itsmebenday·3 Nis

We're launching a new kind of forecasting tournament at @_Mantic_AI. There's $25k in prizes for writing questions, see post below to read more and apply.

English

610

Ben Day retweetledi

Gabriel Fritsch@gabrielpfritsch·24 Mar

Over three weeks into the US-Iran conflict, the situation remains deeply uncertain and fast-moving. @_Mantic_AI has been forecasting the crisis in real time. We wrote about how we've done so far.

English

5.8K

Ben Day@itsmebenday·23 Mar

@theNOBSdentist @MyBeefSword @maxmarchione Same thing for xylitol nicotine gum?

English

Gator | Dentist@theNOBSdentist·23 Mar

@MyBeefSword @maxmarchione Clear as day causative agent for gum recession and oxidative stress to the soft tissues of the mouth

English

697

Max Marchione@maxmarchione·22 Mar

Just about every >150 iq person I know uses nicotine. Nicotine is underrated and misunderstood

English

157

631

638.1K

Ben Day retweetledi

Thinking Machines@thinkymachines·20 Mar

Guest post by @_Mantic_AI on training LLMs to predict world events in Tinker thinkingmachines.ai/news/training-…

Tinker@tinkerapi

Mantic used Tinker to RL gpt-oss-120b on judgmental forecasting; the result outperformed frontier models on event predictions. Combined with @_Mantic_AI's forecasting architecture, task-specific training takes us to the cusp of automated superforecasting.

English

107

42.3K

Ben Day retweetledi

Scott Jeen@enjeeneer·20 Mar

We've been using RL to train LLMs for superforecasting. Our new blog post with @thinkymachines discusses recent progress. We're now in uncharted territory. I'm excited to see how good we can get by pushing this further! 🧵

Tinker@tinkerapi

English

199

24.5K

Ben Day retweetledi

Toby Shevlane@tshevl·20 Mar

I always dreamed of AGI as a wise advisor for humanity. Although LLMs are great for coding & knowledge work, I wouldn’t trust them to give me advice on my career, business strategy, or policy preferences. How can we build AI systems optimized for wisdom? At Mantic we believe the unlock is prediction: predicting world events as accurately as possible, and hill-climbing this single metric. Today we share some recent progress on the Thinking Machines website, having found Tinker a great platform for our RL experiments. TL;DR: We RL-tune gpt-oss-120b to become a better forecaster than any other model. Having good scaffolding is a prerequisite. A fun result: our tuned model + Grok are decorrelated from the other best models, and so are the most indispensable when picking a team.

Tinker@tinkerapi

English

309

150.4K

Ben Day@itsmebenday·20 Mar

@eigenrobot @nonRealBrandon Reverting makes sense for tfr. It’s not saying ‘you’ll have a population of 50M Koreans with tfr 2.1’ it’s ‘whoever remains at this stage must have a stable tfr or they wouldn’t be here’ eg Mormons inherit America

English

eigenrobot@eigenrobot·20 Mar

@nonRealBrandon population projections assume return to stability usually i think "they just assume it?" idk look that's how it is

English

1.2K

eigenrobot@eigenrobot·20 Mar

whew

Jonatan Pallesen@jonatanpallesen

@Empty_America If we look at for example, the U50 population, it's even more bleak.

English

443

27.1K

Ben Day@itsmebenday·19 Mar

@grok @CJHandmer @jasonhickel @grok Our World in Data has the GDP per capita of England in 1878 as £4,011 in 2013 pounds. That’s £5,676 in Jan 2026 pounds which is $7,500 USD. Trading Economics has Cuba’s 2025 GDP at $7,440 USD. What do you make of that?

English

Ben Day@itsmebenday·18 Mar

@FleischmanMena @nickcammarata Here’s one that does flips around a bit before going through. I was selecting for ‘does a spin before going through’ and cropping to the last extended stay in the first room, but it only took 3 samples to get this one.

English

Man, Machine, Self@FleischmanMena·17 Mar

@itsmebenday @nickcammarata I would agree with this were it not for the peculiarly deliberate “take it back out, and turn it 180 before trying again.”

English

151

Nick@nickcammarata·17 Mar

I don’t actually know how to process this, what’s a reasonable explanation of what is going on here

The Figen@TheFigen_

They are ants solving a geometric problem and it is mind-blowingly colorful.

English

208

1.4K

272.5K

Ben Day@itsmebenday·18 Mar

@ultima_shifl @FleischmanMena @nickcammarata Yeah I’m not suggesting the ants are random but that it is useful to see, as a baseline, what taking random trajectories and treating them as they would have when writing the paper gets you eg select the fastest one, select one with a 180 in, etc

English

singularvessel@ultima_shifl·18 Mar

@itsmebenday @FleischmanMena @nickcammarata Running more Brownian motion sims seems like the wrong approach to this question. Clearly this motion is not literally diffusion; the question is whether the ants are doing complex computation or if there exists any sort of simple dynamics that suffices to explain what we see.

English

Ben Day@itsmebenday·4 Mar

@VesselOfSpirit they could rotate the ends a little to improve their bound a bit

English

619

10.1K

Vessel Of Spirit@VesselOfSpirit·4 Mar

Researchers at the University of Switzerland have discovered a new bound on the optimally inefficient way to pack 17 squares into a larger square

English

150

5.9K

94.5K

Ben Day retweetledi

Toby Shevlane@tshevl·11 Şub

Something is happening!

English

714

106.2K

Ben Day retweetledi

Toby Shevlane@tshevl·19 Oca

HUMANS OF MANTIC Hours after we launched our website, before we’d posted it anywhere, I saw a job application from a Oxford economics PhD student from Brazil: “I’ve never been this excited about a startup. I want to help build it.” His background was not typical for an AI startup. But he looked impressive. He’d got a distinction from Yale then spent 3 years as an economist at Goldman Sachs. In his PhD research, he was using LLM forecasters to identify exogenous shocks to fiscal policy. We invited him to lunch with the team. He seemed smart. Ben messaged me: “we should try to get Gabriel to come in for September”. In his first couple of days, Gabriel was reading the code. I wasn’t seeing much output. I asked Ben, what is he doing? Ben told me to wait. Then...Gabriel emerged with an understanding of our prediction engine that was like he’d worked here for months. He started finding weaknesses and generating good ideas. Throughout September, Gabriel was running experiments to test his fixes, and the guy did not miss. +3 points on this eval, +3 points on that eval. To boot, he’s an lovely person. Gabriel grew up in Rio. He speaks about his childhood friends and Brazilian culture (the beach, the food) with joy in his eyes. It must have been a big culture shock turning up to New Haven as a freshman. From the beaches of Rio to Camden's hottest AI startup, @gabrielpfritsch started in a permanent role today, as Member of Technical Staff.

English

4.7K

Ben Day@itsmebenday·15 Oca

@adonis_ds @NathanpmYoung @metaculus metaculus.com/accounts/profi… unless that’s an impostor

English

112

Adonis🔸@adonis_ds·15 Oca

@NathanpmYoung @metaculus Nathan, do you mind sharing your Metaculus profile? Will you be predicting on every question not long after it opens?

English

185

Nathan 🔎@NathanpmYoung·7 Oca

Young vs the World. In a surprising turn, @metaculus have challenged me to beat their forecasters. There is an extra $2500 in prizes if community forecasts are better than mine. Oh my honour, this isn't scammy. It'll be a laugh. Link below.

English

52.4K

Ben Day retweetledi

Toby Shevlane@tshevl·14 Eki

I got back from honeymoon last summer and handed in my resignation at DeepMind. My wife thought I was crazy. AI has always been about prediction, but normally we predict small things: a token of text, or moves in chess. The ultimate challenge is to predict the world’s most important events. We recently went up against some of the world’s top forecasters, and came much closer to beating them than any AI system before. We're used to seeing crazy results from the AI community, but I think this one is special: 1. Accurately forecasting global issues is extremely difficult. 2. You can’t memorize the answer: it hasn't happened yet. 3. It was considered very unlikely for an AI system to do as well as Mantic did (5-10% chance). 4. Superhuman forecasting has the potential for transformative impact across the economy. She still thinks I’m crazy, but less so every day😛

English

790

202.6K

Ben Day@itsmebenday·9 Oca

@BenShindel @StefanFSchubert @Research_FRI Got a new plot for this

Ben Day@itsmebenday

We’ve been making progress on our forecaster at @_Mantic_AI. We started competing in the @metaculus Cup last summer and landed the first top-10 finish for an AI. In the fall, we stepped it up and beat the community prediction, a combined forecast that leverages the ‘wisdom of the crowd’ of ~500 forecasters.

English

Ben@BenShindel·8 Oca

@StefanFSchubert @Research_FRI Meanwhile… I think Mantic is ~on par with Superforecasters at this point.

Ben@BenShindel

One of @metaculus's largest tournaments ever. I narrowly beat @_Mantic_AI, which placed 4th, ahead of the community. Lots of suprising events this quarter, causing the unweighted aggregate to perform about as well as the Metaculus Community Prediction.

English

1.4K

Stefan Schubert@StefanFSchubert·8 Oca

Human forecasters still beat AI models, but a trend extrapolation by @Research_FRI suggests they'll be on par in October 2026.

English

5.8K

Ben Day@itsmebenday·9 Oca

@CharlesD353 @StefanFSchubert @Research_FRI The answer to (i) is yes but it made more of a difference for non-reasoning models and it’s mostly about avoiding bad errors than sharpening up

English

Charles🔸@CharlesD353·9 Oca

Other interesting questions: (i) does averaging the predictions of these 10 instances of the same model improve performance, and by how much? (ii) is the superforecaster line (described as "superforecaster median forecast") the brier of taking the median superforecaster prediction per question, or the median superforecaster Brier? The former would be much more impressive to match. @Research_FRI note that this also is probably an underestimate of models, as the models use only a basic scaffold. So there are significant factors pushing both ways.

English

122

Keşfet

@_Mantic_AI @theNOBSdentist @MyBeefSword @maxmarchione @thinkymachines @eigenrobot @nonRealBrandon @grok