Tom Dwan

5.2K posts

Tom Dwan

@TomDwan

Las Vegas, NV Katılım Haziran 2010

633 Takip Edilen208.1K Takipçiler

Tom Dwan@TomDwan·18h

@BillyM2k @mickeyxfriedman Makes sense

English

123

Shibetoshi Nakamoto@BillyM2k·18h

@mickeyxfriedman this checks out from my experiences

English

1.5K

mickey friedman@mickeyxfriedman·19h

narcissists also seem like obvious candidates for LLM psychosis: high susceptibility to affirmation loops, low tolerance for contradiction, weak reality testing

Steve Stewart-Williams@SteveStuWill

“The single strongest personality predictor [of conspiracy thinking] is narcissism. Narcissists are particularly prone to conspiracy theories because they have a strong need for uniqueness, are prone to paranoia, and can also be remarkably gullible.” stevestewartwilliams.com/p/12-things-ev…

English

Tom Dwan@TomDwan·18h

@RandPaul It was a lab leak

English

2.9K

Rand Paul@RandPaul·22h

CIA scientists concluded that COVID came from a lab leak. Then someone scribbled out their conclusion at 2 am and changed the report. Tomorrow, a whistleblower testifies before my committee. The COVID cover-up is unraveling.

Daily Wire@realDailyWire

"The carelessness of Anthony Fauci...in all likelihood led to this pandemic that 15 million people died, and still, there hasn't been responsibility." @RandPaul joins @cabot_phillips to discuss new whistleblower testimony surrounding the origins of COVID-19:

English

5.3K

18.9K

512.5K

Tom Dwan@TomDwan·20h

@polynoamial I feel like you would be one of the best situated people to help come up with better eval criteria… And I think it’s very important for many aspects of society that there is some more public information around this stuff

English

1.1K

Noam Brown@polynoamial·1d

I love seeing a new eval with such low scores. When we announced GPT-5.5, almost every benchmark had a score above 50%. It's time to retire evals like GQPA and bring in a new set.

Kilian Lieret@KLieret

The first ProgramBench task was just solved by GPT 5.5 high/xhigh. Interestingly, high/xhigh picked two different languages for the task (C vs Python). GPT 5.5 xhigh was significantly better than Opus 4.7 xhigh in all metrics. 🧵

English

843

92K

Tom Dwan retweetledi

Judge Stephen Dillard@JudgeDillard·27 Nis

Every American should watch every second of this video. Thank you, @BenSasse.

English

249

1.9K

12.3K

3.1M

Tom Dwan@TomDwan·21 Nis

@senortilt @nikairball @EWassPoker Me 1st

English

1.7K

Señor Tilt@senortilt·20 Nis

@nikairball @TomDwan @EWassPoker

QAM

3.6K

Señor Tilt@senortilt·20 Nis

in honor of 4/20 once I’m done with work I’d love to light up a blunt and stream some heads up NLH against someone. I don’t even know how to stream poker but if someone can help me set it up I’m in. Who’s game?

English

29.8K

Tom Dwan retweetledi

Run It Once Training@RunItOnce·10 Nis

Yesterday, we sat down with @TomDwan for our 3rd conversation, and it didn't disappoint. We covered a lot of ground, but some notable topics included: • Why did Tom get access to private games that other pros didn't? • How people in poker can use their skills to excel in other industries? • The infamous J-4 hand • The Robl fold You can watch the recording of the Q&A here: once.run/TomTalksPoker

English

34.8K

Run It Once Training@RunItOnce·11 Nis

From @TomDwan's recent Q&A - the story of how Phil Ivey got himself banned from one of the best private games in the world.

English

793

557.6K

Tom Dwan@TomDwan·12 Nis

@RunItOnce Haha funny clip. I told the story pretty accurate

English

8.3K

Tom Dwan@TomDwan·12 Nis

@polynoamial You guys tell the truth a lot more than Anthropic or XAI seem too. I don’t have much knowledge about Google/deepmind I’m hopeful they’re closer on that scale to OAI. We should have both what you suggest, and the public models playing. Along with more options

English

1.4K

Noam Brown@polynoamial·10 Nis

What we really need is a benchmark where AI models make AI models that play poker.

GTOWizard@GTOWizard

We benchmarked every major AI model at poker. GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4 and more. All played 5,000 hands of heads-up no-limit against our state-of-the-art poker agent. Every single one lost. Here's the full breakdown 🧵

English

525

62.6K

Tom Dwan@TomDwan·11 Nis

How about chip in X$, and ask the labs to match that with credits or cheaper rates etc for this specific task. And maybe in exchange point out some obvious leaks etc (they could then try to find where from a reasoning standpoint those came from)

English

6.3K

Tom Dwan@TomDwan·11 Nis

Also, this should be good for the labs themselves (Longterm). Yes a lot of them hate telling the truth when mistakes are made. But this kind of situation is good for them to train models, and systems around how to more accurately assess those models confidence.

English

7.7K

Tom Dwan@TomDwan·11 Nis

Ahh ok. Im happy to try to help you pressure the frontier labs to play ball. Some of them claim to be good enough for military targeting and stuff (even though awful mistakes seem to have happened), why not do a proper real test?

GTOWizard@GTOWizard

@TomDwan Great point, Tom. Running frontier LLMs at scale is expensive. That's why we use AIVAT, a variance reduction technique that achieves the same statistical significance in 10x fewer hands, so 5K is equivalent to ~50K raw hands.

English

27.7K

Tom Dwan@TomDwan·11 Nis

@GTOWizard Was this an automated response? What about doing more hands instead of the “luck-adjusted” bs hahaha. It’s still cool regardless obv tho, happy you guys did this

English

363

GTOWizard@GTOWizard·9 Nis

Results (luck-adjusted bb/100): #1 GPT-5.3 XHigh Reasoning: -16.0 #2 GPT-5.4 XHigh Reasoning: -17.8 #4 Claude Opus 4.6: -20.4 #6 Gemini 3.1 Pro: -30.8 #12 Grok 4 High Reasoning: -60.0 #15 GPT-4: -136.2 A top pro winning at 4 bb/100 is considered elite. The best LLM loses at 4x that rate. Grok 4 at -60.0 performs nearly the same as a strategy that simply folds every single hand (-64.6).

English

8.3K

GTOWizard@GTOWizard·9 Nis

English

199

214.3K

Tom Dwan@TomDwan·10 Nis

And don’t do “luck adjusted” just do enough hands that the variance smoothes out and you can say who won/lost more. Should be trivial to do mil hands if really wanna limit variance no?

English

8.1K

Tom Dwan@TomDwan·10 Nis

This is cool. 5k obviously not enough hands though, you guys should know that. Can you run a new one with 50-100k hands 🙏🏻🙏🏻

GTOWizard@GTOWizard

English

108

85K

Tom Dwan@TomDwan·10 Nis

Ordering CZ’s book and gonna read it.

English

8.6K

Tom Dwan@TomDwan·10 Nis

I can understand at least some of why both Star and Cz are upset. I hope they both take some deep breaths. They both follow me, that feels cool- I hope neither of them tilts and unfollows 😂.

English

10.7K

Tom Dwan@TomDwan·10 Nis

As one of the few foreigners who knows (almost) all of the story of china’s🇨🇳 crypto crackdown, it’s not even close to accurate to say star reported Li Lin

English

32.7K

Tom Dwan@TomDwan·5 Nis

Wow

Tay 💖@tayvano_

I beg everyone in crypto to read this in full. I expected this to be another case of social engineering, likely some recruiter/job offer shit. I was very wrong. And the depth of the operation and personas makes me think they already have multiple other teams on lock. 😳

QST

32.8K

Tom Dwan@TomDwan·30 Mar

I saw @senortilt teach our drunk friend blackjack once…

Señor Tilt@senortilt

Hey @KylieJenner — I’m Sam Kiki. I hold the record for the most ever won in 17 seasons on High Stakes Poker. I also hold the record for largest single day win. I, too, like splashy pots. I have a seat and $500k with your name on it. Bring @RealChalamet. I’ll teach you both everything the @VanityFair video left out. Then we can all compete on @PokerGO with a few of our mutual friends.

English

80.1K

Keşfet

@BillyM2k @mickeyxfriedman @RandPaul @polynoamial @BenSasse @senortilt @nikairball @EWassPoker