Sir Digby Chicken Caesar (@timpac) - Perfil do Twitter

Tweet fixado

Sir Digby Chicken Caesar@timpac·1 Ara

Bear with me.

English

0

1

315

Sir Digby Chicken Caesar@timpac·3h

@3CultureKid @BobMann2001 @mattyglesias Because it's so wildly implausible that the suburbs could be as bad as the hood? Okay, I found the racism!

English

1

0

19

My Burner@3CultureKid·4h

@timpac @BobMann2001 @mattyglesias Pretty sure it's just completely untrue.

English

1

0

11

Matthew Yglesias@mattyglesias·5h

Incredible stuff

English

11

0

78

27.9K

Sir Digby Chicken Caesar@timpac·4h

@AlexBerenson This particular problem apparently has been addressed. Nevertheless, this *type* of failure is one I would expect from AI basically forever, because it doesn't really think.

English

0

11

Alex Berenson@AlexBerenson·17h

I’m starting to feel like the limitations and the strengths of AI are two sides of the same coin; AI is a great mimic and pattern recognizer, so it has no problem validating the person using it (and coding, which is a highly structured task). But the unexpected breaks it easily.

Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

30

194

66.5K

Sir Digby Chicken Caesar@timpac·4h

@BobMann2001 @mattyglesias Maybe it's racist. I'm not sure either but I'm guessing it's racist.

English

1

0

1

17

Bob Mann@BobMann2001·4h

@mattyglesias How so?

English

1

0

618

Sir Digby Chicken Caesar@timpac·4h

@gonglei89 Apparently this is old.

English

0

60

Lei Gong@gonglei89·7h

See, not actual intelligence.

Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

5

6

87

5.6K

Sir Digby Chicken Caesar@timpac·5h

@heynavtoor I guess it's fixed

English

0

14

Nav Toor@heynavtoor·20h

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

784

2.6K

10.1K

1.7M

Sir Digby Chicken Caesar@timpac·5h

This seems to have been fixed.

Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

0

6

Sir Digby Chicken Caesar@timpac·18h

@Smith_WessonInc Compare the number of people who die because they don't have time to rack the slide to the number of people who have accidental discharge. Carrying one in the chamber is dumb.

English

0

11

Smith & Wesson Inc.@Smith_WessonInc·20h

do you carry with one in the chamber?

English

1.5K

55

2.2K

245.1K

Sir Digby Chicken Caesar@timpac·18h

@washghost1 Maillard reaction. That dark brown is pure flavor.

English

0

7

Washingtons ghost@washghost1·21h

This isn’t a burger. This is the stuff you clean off of the grill before you make a burger

English

1.3K

444

11.7K

5.5M

Sir Digby Chicken Caesar@timpac·21h

@owroot Nice job getting a retweet from Douthat. As a Michigander I considered you a local account, I am impressed. 😂

English

0

18

O.W. Root@owroot·1d

People fail to grasp how powerful and great this machine of America is. It's like they think it is some tiny country in Eastern Europe with a population of 2 million and a GDP less than that of Dallas. People don't understand what it is that they are even living in. They don't know how powerful these gears are, how long the game is, how big the purpose is, how great the ship is.

English

91

171

2.5K

327.5K

Sir Digby Chicken Caesar@timpac·1d

@vanikehuman @xwanyex It's ridiculous! Literally premised on the arguments of their opponents being wrong.

English

0

2

27

Van Ike@vanikehuman·1d

@timpac @xwanyex Yep those same arguments are in the footnotes of the main Bible used in the Catholic Church in America.

English

1

0

28

wanye@xwanyex·1d

This is what I mean when I say that a lot of Christian apologetics are embarrassingly bad. “The resurrection must be true, because the apostles were willing to die for it and nobody dies for a lie” is an argument that’s beneath thinking people.

Ana Mostarac@anammostarac

“Who gets back up three days later after he gets murdered in public? Who gets back up under his own power? Buddha didn't do that shit.”

English

510

219

5.4K

555.9K

Sir Digby Chicken Caesar@timpac·1d

@gorgeous0017 @Variety And you're the reason we don't have any funny sitcoms.

English

0

7

GorgeousGlimpse@gorgeous0017·1d

@Variety She's not wrong but context matters. Some jokes were retired because they were genuinely harmful not just uncomfortable. The challenge is distinguishing between discomfort that create insight and discomfort the just hurts people.

English

83

1

109

29.3K

Variety@Variety·1d

"Friends" star Lisa Kudrow says new sitcoms are “too afraid” to make jokes that make audiences “uncomfortable”: "But I’m not drawn to new sitcoms that are multi-camera in front of an audience because I’m not buying it. I don’t know if that’s just because I’ve seen too many single-camera sitcoms—I think we need to get back to being able to tell jokes. I feel like we’ve been too afraid to make jokes that might make people uncomfortable.” variety.com/2026/tv/news/l…

English

445

620

9.3K

7.7M

Sir Digby Chicken Caesar retweetou

James Lindsay, anti-Communist@ConceptualJames·1d

The Babylon Bee is almost always really good, but about once every two weeks or so it achieves perfection.

James Lindsay, anti-Communist tweet media

English

70

1.3K

11.8K

106.1K

Sir Digby Chicken Caesar retweetou

Han Shawnity 🇺🇸@HanShawnity·2d

You don't understand. Tucker Carlson thinks demons run America, we took Maduro to make Venezuela gay, we attacked Iran to rape their women, Russia is prettier than America, Qatar is better in virtually every aspect, shariah law is preferable to American law, people who chant "death to America" are the good guys, we were the bad guys in WWII, and terrorists aren't really terrorists, but he's really America First. You don't get it? Well neither do I, but who are we to question the guy who so bravely and singlehandedly fought off a demon in his sleep.

English

78

309

1.9K

28.3K

Sir Digby Chicken Caesar@timpac·2d

@SteveSkojec Those authors just released the second book in their new series.

English

0

25

Steve Skojec@SteveSkojec·2d

FWIW, I think the prose in The Expanse books is infinitely better than anything in Weir's novels, but I still think Weir tells a great story. It's OK to have solid, quotidian prose if your Hero's Journey is solid and your narrative arc is compelling.

Steve Skojec@SteveSkojec

Andy Weir is one of the most successful novelists of our generation. Both The Martian and Project Hail Mary have gone on to be successful Hollywood films at a time where new stories aren’t getting a lot of play in cinema. PHM is already one of the most successful films of all time. So it fascinates me to see a bunch of amateur writers or those with a fraction of his success taking pot shots at his prose. He’s a master storyteller. You don’t reach that level of breakout success if you’re not. Stop criticizing the people who have already proven their skill and work on your own!

English

4

1

24

1.4K

Sir Digby Chicken Caesar@timpac·2d

@EWErickson Bro, you're getting decimated in the replies.

English

0

9

Erick Erickson@EWErickson·3d

I’m gonna need to call a moratorium on using the word “decimated” until people understand how to use it and stop using it as a substitute for “destroyed.”

English

138

10

312

49.6K

Sir Digby Chicken Caesar@timpac·2d

@dsawyer Larry McMurtry.

English

0

49

J. Daniel Sawyer@dsawyer·3d

Depending on your criteria for "American Tolkien" you have a few choices: ER Burroughs, for foundational influence and epic storytelling Frank Herbert, for thoroughness of worldbuilding and doing a political fantasy with American sensibilities (suspicion of authority, depth of irony, etc.) Mark Twain or James Fennimore Cooper, for the first quintessentially American quest tales. Robert A. Heinlein or Ray Bradbury, for translating the American epic (the Western) into fantasy settings in a definitive fashion. HP Lovecraft, for giving a uniquely American voice to fantasy metaphysics. George Lucas, for combining all of the above into a tale that influences the American storytelling consciousness for generations in the same way Tolkien influenced the English storytelling consciousness.

Brain Leakage@BrainLeakage03

My contention is that the “American Tolkien” isn’t someone like GRRM or Robert Jordan. They’re essentially telling European stories, dealing with matters of rightful kings and chosen ones. The American Tolkien is Burroughs, and his LOTR is the Mars series. A Confederate veteran travels to a savage world that resembles a fantastical version of the Arizona desert. He must learn the ways of the locals, master the wilderness, and carve out a prosperous life for himself through grit, bravery, and sheer exceptionalism. It doesn’t get more American than that.

English

8

13

44

10.2K

Sir Digby Chicken Caesar retweetou

Fandom Pulse@fandompulse·3d

HBO's Carnivàle creator and Blacklist EP and writer Daniel Knauf on Star Trek: Starfleet Academy: "They really have no idea why people watch Star Trek. If they did, this show never would have progressed past the pitch phase." Why was this show greenlit?

English

66

113

1.4K

27.6K

Sir Digby Chicken Caesar@timpac·3d

I can't think of a single instance where cheddar would be my preference. I don't mind it, but I prefer Swiss on sandwiches, blue or aged Gouda for snacking. It's good mixed in Mac and cheese with gruyere, I guess that's the only occasion it would be hard to replace.

Carl@HistoryBoomer

Best cheese

English

0

43

Sir Digby Chicken Caesar@timpac·3d

@EvilSoftGames Like Death to Smoochy.

English

0

395

Evil-Soft.com@EvilSoftGames·3d

I learned a LONG time ago that if Ebert hated something I’d be bound to like it. Ebert was a milquetoast bland-as-unsalted-crackers stick-in-the-mud with muddled pedantic rarely good critiques on pretty much anything. But, hey, even a broken clock is right twice a day...

Sandy Petersen 🪔@SandyofCthulhu

In the last years of his life, Roger Ebert famously claimed that games were not art (he was talking about computer games, but it applies elsewhere). He gave these three reasons to back up his opinion: 1) He himself didn't play games 2) You can win a game 3) Art must have a single visionary. This seemed odd as an objection, because films are famously collective projects. Penny Arcade skewered Ebert with the single comment: "If a hundred artists make art for two years, how is the end product NOT ART?" 1/3

English

1

24

11.1K

Sir Digby Chicken Caesar

Descobrir