dill
493 posts

dill
@BluWolfWhtRbbt
https://t.co/sueZaPjSMK Hungry like the wolf? Follow the white rabbit. Daily posts of referrals, sweepstakes, and blog posts to help you make money.









OpenAI’s newest “smarter” models hallucinate 3x more than the ones they replaced. And OpenAI just published a paper explaining exactly why they can’t stop it. The core argument: AI models hallucinate because every benchmark in the industry scores them like a multiple choice test with no “I don’t know” option. Guess wrong? You might get lucky. Leave it blank? Guaranteed zero. So the models learned to guess. Confidently. Every time. The numbers tell the story. On OpenAI’s own PersonQA benchmark, o1 hallucinated 16% of the time. The newer o3 jumped to 33%. o4-mini hit 48%. Three generations of models, each one lying more often than the last. OpenAI’s explanation: the models “make more claims overall,” producing more right answers AND more wrong ones simultaneously. This tells you everything about how the AI industry actually works. The reinforcement learning that makes models better at reasoning also makes them more confidently wrong. The system that produces intelligence and the system that produces hallucinations are the same system. The paper’s proposed fix is where it gets really interesting. They don’t call for better training data or bigger models. They say the entire benchmark ecosystem needs to be rebuilt to reward uncertainty. Every leaderboard, every eval, every scoring rubric needs an “I don’t know” option that doesn’t tank your score. But every AI company uses those same leaderboards to market their models. Admitting uncertainty drops your accuracy number. And dropped accuracy numbers don’t raise $40B funding rounds. OpenAI just published mathematical proof that the incentive structure producing hallucinations is the same incentive structure producing their revenue.






















