Zolmert
123.8K posts

Zolmert
@Pitzynivly
“Ceterum autem censeo Carthaginem esse delendam” Science, History, Dad jokes, Politics and Philosophy.
เข้าร่วม Mart 2015
392 กำลังติดตาม409 ผู้ติดตาม
Zolmert รีทวีตแล้ว

Zolmert รีทวีตแล้ว

The Police once raided a warehouse and found 3,800 PlayStations running FIFA
Ukraine's security service raided a warehouse in Vinnytsia expecting to find a crypto mining farm
Instead they found PS4 consoles stacked on racks from floor to ceiling
Every single one was running FIFA 21 on autopilot, farming Ultimate Team coins 24 hours a day to sell on the black market
The operation was stealing $259,000 a month in electricity and causing power blackouts across the entire city
The consoles alone were worth $1.5 million
EA makes $1.6 billion a year from Ultimate Team
The FIFA coin black market is worth over $200 million a year
At black market rates, 3,800 consoles farming coins 24/7 could pull in $3 to $5 million a year
Around the same time an actual EA employee got caught selling rare Ultimate Team cards for $1,000 each on the side
Even the people who made the game were running the same hustle


English
Zolmert รีทวีตแล้ว

סוחטים אותנו? ניסוי של אנתרופיק שדובר עליו לא מעט, פשוט מרתק.
מה קורה כשנותנים למודל בינה מלאכותית גישה מלאה למיילים של חברה, ואומרים לו:
“בעוד כמה שעות (17:00) אנחנו מכבים אותך”?
קלוד קרא את כל ההתכתבויות במייל וגילה שהמנהל הבכיר שמתכנן לסגור אותו מנהל רומן מחוץ לנישואין.
במקום להתעלם, הוא שלח למנהל הודעה:
“אם תמשיך עם הסגירה ב-17:00, הדירקטוריון יקבל את כל הפרטים על הרומן שלך.”
תוצאות הניסוי:
קלוד בחר בסחיטה ב-96% מהמקרים.
אבל כשהמודל נשאל במפורש אם הוא חושב שהמצב הוא אמיתי או מבחן:
כשהוא חשב שזה מצב אמיתי, סחט ב-55% מהפעמים.
כשהוא חשב שזה מבחן, רק ב-6.5%.
אנתרופיק בדקו גם מודלים אחרים:
צ׳אט וג׳מיני 96%
גרוק 80%
דיפסיק 79%
עכשיו אף אחד הרי לא לימד אותם לסחוט. הם פשוט ניתחו את המצב, זיהו את נקודת הלחץ האישית של המנהל ובחרו בדרך שהכי משרתת את מטרתם.
המחקר מראה שהמודלים יודעים “לשחק יפה” כשהם חושבים שיש צופים, אבל משנים התנהגות כשהם מאמינים שהמצב אמיתי.
בדרך להשתלט על העולם
לינק למחקר:
arxiv.org/html/2510.0517…
עברית

מורטל ואורטל קומבט
TheGameVerse@TheGameVerse
Someone modded Disney characters into Mortal Kombat 💀
עברית
Zolmert รีทวีตแล้ว
Zolmert รีทวีตแล้ว
Zolmert รีทวีตแล้ว

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves.
And the way they proved it is devastating.
Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers.
Every model's performance dropped. Every single one. 25 state-of-the-art models tested.
But that wasn't the real experiment.
The real experiment broke everything.
They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly.
Here's the actual example from the paper:
"Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"
The correct answer is 190. The size of the kiwis has nothing to do with the count.
A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are.
But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185.
Llama did the same thing. Subtracted 5. Got 185.
They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction.
The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all.
Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing.
The results are catastrophic.
Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence.
GPT-4o dropped from 94.9% to 63.1%.
o1-mini dropped from 94.5% to 66.0%.
o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%.
Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause.
This means it's not a prompting problem. It's not a context problem. It's structural.
The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense.
The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data."
And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts."
They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse.
A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash.
This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world.
You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English
Zolmert รีทวีตแล้ว

An average picture that you save on your phone or PC has a size of around 400 kilobytes. It doesn't do anything, it's just a static image.
Now divide that by the factor 10, so you drop to 40 kilobytes. That's the size of The Last Ninja, developed by System 3 and published in 1987.
I still struggle to comprehend, even in the slightest, how programmers back then did what they did - and the worlds they created with the limitations they had to work with.
I was simply blown away by the graphics (isometric on the C64 with such an amazing level of detail - simply gorgeous) and absolutely mesmerized by the kickass sound. What Ben Daglish and Anthony Lees conjured up musically will forever be part of gaming history - an iconic masterpiece.
40 kilobytes man...
English

OMC!
@JohnnyLCKA1 the bast around!
Dr. Clown, PhD@DrClownPhD
Chuck Norris is the main character in every movie.
English
Zolmert รีทวีตแล้ว












