Giuseppe De Nicolao

2.4K posts

Giuseppe De Nicolao

Giuseppe De Nicolao

@Giuseppednc

University Professor, Univ. of Pavia, Italy. Interests: automation and control, data analysis, bioengineering, research and education policy.

Italia 가입일 Mayıs 2011
211 팔로잉1.2K 팔로워
Giuseppe De Nicolao 리트윗함
Nav Toor
Nav Toor@heynavtoor·
🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.
Nav Toor tweet media
English
742
2.5K
9.4K
1.6M
Giuseppe De Nicolao 리트윗함
ROARS
ROARS@Redazione_ROARS·
Il dibattito sull’università italiana si arricchisce di tre volumi recenti. roars.it/tre-libri-sull…
Italiano
0
3
4
315
Giuseppe De Nicolao 리트윗함
Andrew Akbashev
Andrew Akbashev@Andrew_Akbashev·
A really dangerous situation. Too many submissions. Too many generated papers. Little responsibility. 1. In 2026, more than 24,000 submissions were made to the International Conference on Machine Learning (ICML). It’s TWO times more than in 2025. To fight it, the organizers now require researchers to pay $100 for every subsequent paper. 2. LLM adoption has increased researcher productivity by 90% (there’s a recent paper in Science). 3. The number of papers is becoming far too high. Submissions to arXiv have risen by 50% since 2022. 4. There are simply not enough reviewers. Plus, many scientists no longer want to invest precious time in it for free. 5. We can’t easily identify AI-made papers from the genuine ones. __ Important words from Paul Ginsparg, a co-founder of arXiv: “AI slop frequently can’t be discriminated just by looking at abstract, or even by just skimming full text. This makes it an “existential threat” to the system.” Basically, we’re getting closer to the tipping point. 📍 Many professors blame the AI. But the problem is likely elsewhere: 1. Without a sufficient number of papers, many PIs can’t get funded. They have to prove their credibility to reviewers. Their proposals have to rely on prior publications. In many countries, there are some informal (or even formal) expectations for how many papers a group with a certain size has to publish to survive (funding-wise). 2. Our students / postdocs need papers if they want to be hired in faculty roles. Yes, some departments hire people with few publications. But the majority still want to ensure their faculty can get funded. If funding is partly a function of papers, this is used in decision-making. 3. The number of papers is important if you want to get high-level awards. Many of them are not given because you published one paper (even if it’s great). They are given because you made a meaningful CONTRIBUTION to the field. How do you make it? Publish more papers. 4. Tenure promotions in many places take the number of your papers into account (often indirectly). Your tenure may get delayed if you don’t publish enough. Not everywhere, but for many mid- to low-ranked universities this story is more or less the same. + There are many more to mention. 📍My opinion: Much of this is rooted in how funding is distributed. There is a strong correlation between the requirements at a university and the funding acquisition criteria. If funding were based ONLY on the quality of published papers, universities would hire people for the quality of their science. If funding agencies strongly discouraged publishing too many papers, universities wouldn’t expect numbers from faculty during promotions. And some supervisors wouldn’t pressure students and postdocs to publish unfinished studies and low-quality data. Yes, we need good detectors of fake papers. But we also need the right policies and better funding allocation criteria.
Andrew Akbashev tweet media
English
94
374
1.4K
193.4K
Giuseppe De Nicolao 리트윗함
ROARS
ROARS@Redazione_ROARS·
Definitivo il nuovo regolamento di @ANVUR. Il testo contiene solo modifiche cosmetiche al testo predisposto dal governo. Il risultato finale è la consegna definitiva di ANVUR nelle mani del ministro di turno. roars.it/addio-alla-fin…
Italiano
4
8
13
7.9K
Giuseppe De Nicolao 리트윗함
ROARS
ROARS@Redazione_ROARS·
In una società sempre più autoritaria e militarizzata, un’università autonoma, libera, critica e pluralista diventa un problema. La riforma costruisce un sistema più gerarchizzato, meno libero e più facilmente controllabile dalla politica. roars.it/la-riforma-a-p…
Italiano
0
7
9
404
Giuseppe De Nicolao 리트윗함
ROARS
ROARS@Redazione_ROARS·
Il Senato ha approvato la riforma del reclutamento. Addio all'Abilitazione Scientifica Nazionale. roars.it/addio-allasn-e…
Italiano
2
9
12
2.3K
Giuseppe De Nicolao
Giuseppe De Nicolao@Giuseppednc·
SPC methods spotlight meaningful deviations and growing interregional disparities in Italy’s AMR landscape. By identifying outliers and unusual trends, they complement traditional surveillance and help target public health actions where they’re most needed. 6/6
English
0
0
1
38
Giuseppe De Nicolao
Giuseppe De Nicolao@Giuseppednc·
Z-score control charts (CCs) let us track how a region’s AMR evolves over time by standardizing yearly values. Chi-squared CCs zoom out further, spotting systemic nationwide changes that single-region tools might miss. 5/6
Giuseppe De Nicolao tweet media
English
1
0
0
45
Giuseppe De Nicolao 리트윗함
Francesco Sylos Labini
Francesco Sylos Labini@fsyloslab·
Discussione con Pino Arlacchi, Fabio Massimo Parenti, Michele Geraci et al. Il mio intervento si è focalizzato sulla risposta alla domanda che ha posto Berlusconi: “Perché pagare uno scienziato quando facciamo le scarpe migliori del mondo” francescosyloslabini.info/2025/09/27/il-…
Italiano
4
7
16
672