
Tom Dwan
5.2K posts





We benchmarked every major AI model at poker. GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4 and more. All played 5,000 hands of heads-up no-limit against our state-of-the-art poker agent. Every single one lost. Here's the full breakdown 🧵

@TomDwan Great point, Tom. Running frontier LLMs at scale is expensive. That's why we use AIVAT, a variance reduction technique that achieves the same statistical significance in 10x fewer hands, so 5K is equivalent to ~50K raw hands.




We benchmarked every major AI model at poker. GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4 and more. All played 5,000 hands of heads-up no-limit against our state-of-the-art poker agent. Every single one lost. Here's the full breakdown 🧵

Hey @KylieJenner — I’m Sam Kiki. I hold the record for the most ever won in 17 seasons on High Stakes Poker. I also hold the record for largest single day win. I, too, like splashy pots. I have a seat and $500k with your name on it. Bring @RealChalamet. I’ll teach you both everything the @VanityFair video left out. Then we can all compete on @PokerGO with a few of our mutual friends.








