Observatorium

2.8K posts

Observatorium

@Observatorium14

เข้าร่วม Ağustos 2019

350 กำลังติดตาม61 ผู้ติดตาม

Observatorium@Observatorium14·3h

@heynavtoor I am not a giant corporation but I did snap a picture of a 4th graders math sheet and it got 4 addition problems wrong on a page of 30. The fact that i (an idiot) caught it and the AI didn’t told me all I needed to know.

English

127

Nav Toor@heynavtoor·8h

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

533

1.5K

5.9K

891.7K

Observatorium@Observatorium14·3h

@burnerDevAcct @feelsdesperate He should have appointed the Easter Bunny Supreme Ruler of Iran

English

Chris@burnerDevAcct·5h

@feelsdesperate only way it could have been better is if at the end, the Easter Bunny took its head off, and it was Marco Rubio the whole time

English

111

1.4K

Coddled Affluent Professional@feelsdesperate·5h

The aesthetics are so good. They’re so good. I understand how people become Trump cultists. This stuff is magical.

Coddled Affluent Professional tweet media

English

100

258

4.4K

53.5K

Observatorium@Observatorium14·3h

@0xQuasark That’s risky business

English

Paul Brown@0xQuasark·11h

A 15-year-old girl accidentally took 10x doses of LSD at the same time. All because a dealer at a festival forgot a decimal in a dose of liquid LSD. Instead of 100 µg, she got: 1,100 µ𝗴 6 hours later, she started seizing. Her fists locked up & she went fetal. But when she woke up in the hospital 14 hours later, the only words she said to her father were: "𝘐𝘵'𝘴 𝘰𝘷𝘦𝘳." At first, he thought she meant the LSD. Turns out, she meant 𝗵𝗲𝗿 𝗯𝗶𝗽𝗼𝗹𝗮𝗿 𝗱𝗶𝘀𝗼𝗿𝗱𝗲𝗿. The mood swings & hallucinations she had struggled with since she was 5? Completely. 𝗚𝗼𝗻𝗲. 10 Hours later, she walked home. She never used her BPD meds again. She also never relapsed again. They checked in with her 13 years later and found she never had another issue. 𝗧𝗵𝗲 𝗟𝗦𝗗 𝗵𝗮𝗱 𝗯𝗮𝘀𝗶𝗰𝗮𝗹𝗹𝘆 𝗿𝗲𝘀𝗲𝘁 𝗵𝗲𝗿 𝗯𝗿𝗮𝗶𝗻.

English

291

455

5.2K

400.5K

Observatorium@Observatorium14·7h

@MrWHOsecond2 Nobody cares. Just stand up when they stand up, sit down, kneel, ect

English

ミハイル✝️☦️@MrWHOsecond2·17h

カトリックの聖堂はめちゃくちゃ行ってみたいですわよでもミサの仕組みとか知らんから粗相しないかめちゃくちゃ不安である

日本語

268

954

15.6K

Observatorium@Observatorium14·7h

@zeke_22 The Mississippi River also flows through the North

English

138

ZEKE22@zeke_22·11h

アメリカの海外ニキと食べ物の話になるとたびたび目にするのが、「アメリカの南部」というワード。でも悲しいかな。私はHearts of Iron IVでしかアメリカの南部を知らないから「石油がたくさん出るところ」しかこの土地の知識がない。あとは、ミシシッピ川が流れてるくらいしか知らない。

日本語

101

1.2K

52.4K

Observatorium@Observatorium14·7h

@JohnWight1 The hamburger made in Germany?

English

John Wight@JohnWight1·17h

The civilisation that invented algebra is currently doing battle with the one that invented the hamburger. This is all you need to know.

English

7.5K

29.7K

592K

Observatorium@Observatorium14·11h

@TheMightyHans @VicVijayakumar Name one thing you cool on this skillet without adding oil or fat

English

Disciple Hans@TheMightyHans·20h

@VicVijayakumar No, it's not, if you actually used an iron skillet for cooking you'd know that when you do that once, it takes a long time to get it non-stick and back into shape. You can argue with me, but if you do you're wrong and I'm right.

English

130.9K

Vic 🌮@VicVijayakumar·1d

unironically this is how you should wash your cast iron skillet. it's fine.

Dividend Hero@HeroDividend

My grandma always makes me do the dishes after Easter lunch She will be so happy to see that I cleaned her dirty old pan

English

375

148

10K

4.1M

Observatorium@Observatorium14·12h

@Spamfromk You don’t get to do nothing very often

English

桜♡@Spamfromk·15h

why are men always ok with doing nothing for their birthday?

English

5.5K

518

16.2K

Observatorium@Observatorium14·12h

@BrownIdGirl2098 @RealEmirHan It’s not literal. He just means it’s not cheap and you don’t want to fuck it up.

English

397

Iz²y@BrownIdGirl2098·1d

@RealEmirHan Is he serious? $20,000 a minute. Who the hell charges a set for a large price for a short amount of time?

English

42.7K

Emir Han@RealEmirHan·1d

Chris Pratt thought he was ruining his career filming dance scene in Guardians of the Galaxy. He had to improvise: “This set is costing $20,000 a minute. I might be blowing this. I don’t know how to dance.” He asked for choreographer. Gunn said, “Just dance.” So he went for it.

English

198

970

48.4K

4.5M

Observatorium@Observatorium14·12h

@NickHintonn But WHY? There’s a reason beyond that and it’s pretty important

English

Nick Hinton@NickHintonn·21h

Mario is literally a game about a dude who eats mushrooms and starts fighting reptilians.

English

251

2.3K

19.6K

358.7K

Observatorium@Observatorium14·12h

@MUKIDEZA2 Death To Smoochy, Garth Merengies Darkplace which isn’t a movie but it’s a 6 episode show you can watch for free on YouTube

English

ムキデザ│グラフィックデザイナー@MUKIDEZA2·20h

海外の方〜おすすめのコメディ映画教えて頂けませんか、好みはハングオーバーとかスーパーヒーローズ、ゾーハンとかです！くっっっだらないやつが好みです！

日本語

1.1K

750

34.6K

Observatorium@Observatorium14·13h

@MUKIDEZA2 If there’s a river it’s usually the border. If there isn’t a river we just draw a straight line.

English

230

ムキデザ│グラフィックデザイナー@MUKIDEZA2·17h

アメリカの州の切り分け方真っ直ぐすぎるよね。これ切った人にピザとかケーキとかも切って欲しい。

日本語

171

1.2K

32.2K

Observatorium@Observatorium14·13h

@upstatefederlst I had no idea there was a PlayStation 3 during the 360 era. My friends all still had PS2 and that’s what we played.

English

Upstate Federalist@upstatefederlst·22h

I dunno, man. I plugged my PS2 in for the first time in 20 years on Friday and played Guitar Hero 3 with my kids and everything just worked. Didn't have to update a subscription for a song library on a server that doesn't exist anymore.

定@de3dsoul

The Best Era

English

219

4.4K

124.4K

Observatorium@Observatorium14·14h

@realtimsharp Because South Korea would be the ones to bear the brunt of the counter attack. Are you retarded?

English

Tim Sharp 🍊 🍊 🇺🇸@realtimsharp·1d

North Korea is a nuclear armed nation and has openly stated their disdain for America for 20 years. Why aren’t we bombing them?

English

3.1K

1.4K

15.2K

1.1M

Observatorium@Observatorium14·14h

@shortmagsmle Are you saying to leave him behind?

English

Lee (Greater)@shortmagsmle·1d

"No man left behind" is one of those things that, if you have to explain it to an adult man, you are already wasting your breath because it's too late for him to understand it and he's not worth really talking to in the first place

English

769

8.2K

112.3K

Observatorium@Observatorium14·14h

@QuietCompoundFI @bryan_johnson It’s called being excited. Humans like to share when they are excited about something.

English

Bryan Johnson@bryan_johnson·1d

Something happened in the past six months post psilocybin and 5-MeO-DMT that I can't fully explain. The brain data helps but doesn't complete the picture. Feels like a home I didn't know I was looking for. I'm trying to figure out what to do with that now.

English

489

103

4.1K

318K

Observatorium@Observatorium14·14h

@TRHLofficial If you own a house you need dirt every few years to keep it maintained

English

The Redheaded libertarian@TRHLofficial·23h

Men is this true.

English

862

19.1K

288.8K

Observatorium@Observatorium14·14h

@romanhelmetguy They just don’t risk anything anymore. Not that that will save them.

English

Roman Helmet Guy@romanhelmetguy·1d

Euros be like: “I would never risk $300M to save the life of a single soldier, but also I would never grow the economy by $300M if it risked the life of a single spotted owl.”

bumbadum@bumbadum14

It’s frankly horrifying to see so many Euros post about the Iran rescue operation and imply that they wouldn’t do the same because it would be too expensive.

English

184

15.4K

324.9K

Observatorium@Observatorium14·14h

@Evil13rt @ReviewsPossum I can’t understand not understanding it.

English

Evil13rt🐧@Evil13rt·19h

@ReviewsPossum I think that even if the US did not have massive stockpiles of money to burn, it would still make the same choices. Even if it isn’t logical or even if it fails every cost-benefit analysis. It’s just a core part of American culture. You can’t understand it from the outside.

English

3.7K

Possum Reviews@ReviewsPossum·1d

When German and Japanese soldiers in World War II noticed the Americans were willing to throw away tons of ammunition and equipment to rescue their guys, it was a massive blow to their morale because they knew their own side wouldn't do the same for them.

Daniel Foubert 🇵🇱🇫🇷@Arrogance_0024

The "leave no man behind" doctrine is actually a strategic weakness disguised as a virtue. Name one other military on earth that destroys 6 aircraft and fights a ground battle inside a sovereign nation to recover one pilot. You can't. Because no other military confuses tactical sentimentality with strategic logic. Soldiers serve the mission. The mission doesn't serve the soldier. The US has now established that Iran can shoot down an F-15, then watch America spend $300M and expose Delta Force trying to prove it didn't happen. That's not military doctrine. That's politics with weapons. A military that cannot accept the risk of loss cannot win wars. The US hasn't won one since 1945.

English

543

7.4K

137K

Observatorium@Observatorium14·15h

@Camp4 You can have both

English

Kevin Dahlstrom@Camp4·1d

In 2007, there was a popular opinion that nobody will use a phone without a mechanical keyboard. This is identical. Anyone who drives a Tesla knows that a well-designed touchscreen is far better than a million buttons and knobs.

Top Gear@BBC_TopGear

"A large touchscreen doesn't work in a car": Sir Jony Ive on designing the Ferrari Luce's interior ➡️ top-gear.visitlink.me/yTpZer

English

275

196

76.8K

ค้นพบ

@heynavtoor @burnerDevAcct @feelsdesperate @0xQuasark @MrWHOsecond2 @zeke_22 @JohnWight1 @TheMightyHans