Observatorium

2.8K posts

Observatorium banner
Observatorium

Observatorium

@Observatorium14

เข้าร่วม Ağustos 2019
350 กำลังติดตาม61 ผู้ติดตาม
Observatorium
Observatorium@Observatorium14·
@heynavtoor I am not a giant corporation but I did snap a picture of a 4th graders math sheet and it got 4 addition problems wrong on a page of 30. The fact that i (an idiot) caught it and the AI didn’t told me all I needed to know.
English
0
0
0
127
Nav Toor
Nav Toor@heynavtoor·
🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.
Nav Toor tweet media
English
533
1.5K
5.9K
891.7K
Chris
Chris@burnerDevAcct·
@feelsdesperate only way it could have been better is if at the end, the Easter Bunny took its head off, and it was Marco Rubio the whole time
English
5
4
111
1.4K
Coddled Affluent Professional
The aesthetics are so good. They’re so good. I understand how people become Trump cultists. This stuff is magical.
Coddled Affluent Professional tweet media
English
100
258
4.4K
53.5K
Paul Brown
Paul Brown@0xQuasark·
A 15-year-old girl accidentally took 10x doses of LSD at the same time. All because a dealer at a festival forgot a decimal in a dose of liquid LSD. Instead of 100 µg, she got: 1,100 µ𝗴 6 hours later, she started seizing. Her fists locked up & she went fetal. But when she woke up in the hospital 14 hours later, the only words she said to her father were: "𝘐𝘵'𝘴 𝘰𝘷𝘦𝘳." At first, he thought she meant the LSD. Turns out, she meant 𝗵𝗲𝗿 𝗯𝗶𝗽𝗼𝗹𝗮𝗿 𝗱𝗶𝘀𝗼𝗿𝗱𝗲𝗿. The mood swings & hallucinations she had struggled with since she was 5? Completely. 𝗚𝗼𝗻𝗲. 10 Hours later, she walked home. She never used her BPD meds again. She also never relapsed again. They checked in with her 13 years later and found she never had another issue. 𝗧𝗵𝗲 𝗟𝗦𝗗 𝗵𝗮𝗱 𝗯𝗮𝘀𝗶𝗰𝗮𝗹𝗹𝘆 𝗿𝗲𝘀𝗲𝘁 𝗵𝗲𝗿 𝗯𝗿𝗮𝗶𝗻.
Paul Brown tweet media
English
291
455
5.2K
400.5K
Observatorium
Observatorium@Observatorium14·
@MrWHOsecond2 Nobody cares. Just stand up when they stand up, sit down, kneel, ect
English
0
0
0
2
ミハイル✝️☦️
ミハイル✝️☦️@MrWHOsecond2·
カトリックの聖堂はめちゃくちゃ行ってみたいですわよ でもミサの仕組みとか知らんから粗相しないかめちゃくちゃ不安である
日本語
268
16
954
15.6K
Observatorium
Observatorium@Observatorium14·
@zeke_22 The Mississippi River also flows through the North
English
0
0
0
138
ZEKE22
ZEKE22@zeke_22·
アメリカの海外ニキと食べ物の話になるとたびたび目にするのが、 「アメリカの南部」 というワード。 でも悲しいかな。 私はHearts of Iron IVでしかアメリカの南部を知らないから「石油がたくさん出るところ」しかこの土地の知識がない。 あとは、ミシシッピ川が流れてるくらいしか知らない。
ZEKE22 tweet media
日本語
101
41
1.2K
52.4K
John Wight
John Wight@JohnWight1·
The civilisation that invented algebra is currently doing battle with the one that invented the hamburger. This is all you need to know.
English
2K
7.5K
29.7K
592K
Disciple Hans
Disciple Hans@TheMightyHans·
@VicVijayakumar No, it's not, if you actually used an iron skillet for cooking you'd know that when you do that once, it takes a long time to get it non-stick and back into shape. You can argue with me, but if you do you're wrong and I'm right.
English
98
0
99
130.9K
桜♡
桜♡@Spamfromk·
why are men always ok with doing nothing for their birthday?
English
5.5K
518
16.2K
3M
Iz²y
Iz²y@BrownIdGirl2098·
@RealEmirHan Is he serious? $20,000 a minute. Who the hell charges a set for a large price for a short amount of time?
English
12
1
19
42.7K
Emir Han
Emir Han@RealEmirHan·
Chris Pratt thought he was ruining his career filming dance scene in Guardians of the Galaxy. He had to improvise: “This set is costing $20,000 a minute. I might be blowing this. I don’t know how to dance.” He asked for choreographer. Gunn said, “Just dance.” So he went for it.
English
198
970
48.4K
4.5M
Observatorium
Observatorium@Observatorium14·
@NickHintonn But WHY? There’s a reason beyond that and it’s pretty important
English
0
0
0
83
Nick Hinton
Nick Hinton@NickHintonn·
Mario is literally a game about a dude who eats mushrooms and starts fighting reptilians.
English
251
2.3K
19.6K
358.7K
Observatorium
Observatorium@Observatorium14·
@MUKIDEZA2 Death To Smoochy, Garth Merengies Darkplace which isn’t a movie but it’s a 6 episode show you can watch for free on YouTube
English
0
0
0
15
ムキデザ│グラフィックデザイナー
海外の方〜 おすすめのコメディ映画教えて頂けませんか、好みはハングオーバーとか スーパーヒーローズ、ゾーハンとかです! くっっっだらないやつが好みです!
ムキデザ│グラフィックデザイナー tweet mediaムキデザ│グラフィックデザイナー tweet mediaムキデザ│グラフィックデザイナー tweet media
日本語
1.1K
24
750
34.6K
Observatorium
Observatorium@Observatorium14·
@MUKIDEZA2 If there’s a river it’s usually the border. If there isn’t a river we just draw a straight line.
English
0
0
8
230
Observatorium
Observatorium@Observatorium14·
@upstatefederlst I had no idea there was a PlayStation 3 during the 360 era. My friends all still had PS2 and that’s what we played.
English
0
0
0
67
Upstate Federalist
Upstate Federalist@upstatefederlst·
I dunno, man. I plugged my PS2 in for the first time in 20 years on Friday and played Guitar Hero 3 with my kids and everything just worked. Didn't have to update a subscription for a song library on a server that doesn't exist anymore.
@de3dsoul

The Best Era

English
85
219
4.4K
124.4K
Observatorium
Observatorium@Observatorium14·
@realtimsharp Because South Korea would be the ones to bear the brunt of the counter attack. Are you retarded?
English
0
0
0
3
Tim Sharp 🍊 🍊 🇺🇸
North Korea is a nuclear armed nation and has openly stated their disdain for America for 20 years. Why aren’t we bombing them?
English
3.1K
1.4K
15.2K
1.1M
Lee (Greater)
Lee (Greater)@shortmagsmle·
"No man left behind" is one of those things that, if you have to explain it to an adult man, you are already wasting your breath because it's too late for him to understand it and he's not worth really talking to in the first place
English
91
769
8.2K
112.3K
Bryan Johnson
Bryan Johnson@bryan_johnson·
Something happened in the past six months post psilocybin and 5-MeO-DMT that I can't fully explain. The brain data helps but doesn't complete the picture. Feels like a home I didn't know I was looking for. I'm trying to figure out what to do with that now.
English
489
103
4.1K
318K
Observatorium
Observatorium@Observatorium14·
@TRHLofficial If you own a house you need dirt every few years to keep it maintained
English
0
0
0
0
Evil13rt🐧
Evil13rt🐧@Evil13rt·
@ReviewsPossum I think that even if the US did not have massive stockpiles of money to burn, it would still make the same choices. Even if it isn’t logical or even if it fails every cost-benefit analysis. It’s just a core part of American culture. You can’t understand it from the outside.
English
2
3
64
3.7K