Dr. of AI/ML | Lead AI Software Engineer | Researcher in NLP & GenAI @MasakhaneNLP | Public Speaker | PhD @UniHannover | Apostle of Jesus Christ | Views my own.
Language and Translations Symposium: "Language And Social Media: Can The Internet Save Dying Languages?", will be live on the 17th at noon - 1 p.m. East African Time. Registration (all tickets are free) can be done 👉 bitly.com/thejaladasympo…
I tell GPT 5.5, you are a manager, not a coder. Find the issues to solve and delegate to other agents. Do not write any code yourself.
It does so for a while. I think "good GPT" and log off, I let it do its long running tasks with its team of subordinates.
I log on an hour later and check in.
GPT 5.5 is coding alone, its sub agents diligently waiting for orders.
No STOP, I say, you are a manager. You MUST NOT code.
My bad, says GPT 5.5, got it, I must manage, not code.
One hour later, GPT 5.5 is coding.
But it's OK GPT, I get you. For I am also guilty. No matter how many times a coder is told they are a manager, in their heart of hearts, they are still a coder.
So I tell Claude Opus 4.7...
@NM_AIST Building AI/ML systems is great, but understanding the underlying math is essential. Before rushing to master advanced pipelines like MCP or RAG, take the time to master the fundamentals: Transformers, Tokenizers, and Embeddings.
#GenerativeAI#DataScience#AIinAfrica
In 2023, I paused my PhD to join @OpenAI to build the world’s first reasoning machine — OpenAI o1.
Earlier this year, I defended my PhD thesis “Building a Reasoning Machine” advised by @Yoshua_Bengio at @Mila_Quebec 🎓 🎉
Much has changed since Yoshua and I first discussed reasoning in 2022, but the main themes aged well:
- Adding structures to computation unlocks strong reasoning capabilities;
- Data & sample efficiency will become the bottleneck to useful intelligence;
- Retaining Bayesian uncertainty is key to reliable and safe AI systems.
You can read the introduction of my thesis here: edwardjhu.com/thesis/
My next professional chapter (TBA) will be on bridging frontier intelligence with real economic impact, a theme dear to my heart after working closely with @drwconvexity and @suna_said in the last year 🚀
À 28 ans, un an après mon PhD, j’ai été sélectionné comme Professor à Azusa Pacific University. Je lance le SIMS Lab, où nous développons des systèmes human-on-chip pour comprendre comment les cellules immunitaires interagissent avec les organes et façonnent les maladies.
#congo
I don’t really understand the maths it takes to send humans behind the Moon and bring them back safely. And the more I sit with that, the more it genuinely messes with my head even tho my love for physics and my knowledge of physics is astounding to a point
Somebody had to work out a path where the Moon’s gravity is pulling you in, the Earth is pulling you back, and you’re moving just fast enough and not slow enough not to get trapped by either. They had to figure out the exact angle to come back into Earth’s atmosphere too. Too steep, you burn up. Too shallow, you bounce off and drift into space. And they had to get all of that right at the same time, for real people sitting in a small metal capsule about 400k kilometres away from home.
Nothing in that system is standing still.
The Moon is moving.
The Earth is moving.
Even the Sun is pulling on everything. And still, some people looked at all of that motion, all of that chaos, and turned it into numbers you can follow. Go here.
Adjust here.
Come back here.
And unlike nepa light, it infact works.
There’s also that moment in the journey where the crew passes behind the Moon. No contact with Earth. No signal. Just silence, with a massive rock blocking everything they’ve ever known. The only reason they can stay calm in that moment is because someone, somewhere, did the maths and proved they’ll come out the other side.
I don’t know what it feels like to trust something that much. To put your life in an equation when you’re that far away from everything.
But I do know this for sure, whatever that level of thinking is, whatever it takes to reach it, it might be one of the most extraordinary things human beings have ever done...
🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves.
And the way they proved it is devastating.
Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers.
Every model's performance dropped. Every single one. 25 state-of-the-art models tested.
But that wasn't the real experiment.
The real experiment broke everything.
They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly.
Here's the actual example from the paper:
"Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"
The correct answer is 190. The size of the kiwis has nothing to do with the count.
A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are.
But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185.
Llama did the same thing. Subtracted 5. Got 185.
They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction.
The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all.
Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing.
The results are catastrophic.
Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence.
GPT-4o dropped from 94.9% to 63.1%.
o1-mini dropped from 94.5% to 66.0%.
o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%.
Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause.
This means it's not a prompting problem. It's not a context problem. It's structural.
The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense.
The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data."
And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts."
They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse.
A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash.
This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world.
You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.
2004 was a good year, but your Gmail address doesn't need to be stuck in it.
To say goodbye to v0t3f0rp3dr02004@gmail.com or mrbrightside416@gmail.com (or whatever you were into at the time), go to your Google Account settings and choose any name available. You'll keep your old username and you can sign in with both.
My dear front-end developers (and anyone who’s interested in the future of interfaces):
I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept):
Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow
"Do not learn to code" is the worst career advice of the decade.
People are telling college students to skip Computer Science because AI will just automate it all. Andrew Ng just killed this myth at Stanford with a brilliant analogy.
When he tried to generate images with Midjourney, he typed: "make pretty pictures of robots" and got garbage.
His collaborator, however, understood Art History. He knew the exact vocabulary of lighting, genre, and palette. He spoke the "language of art," and generated masterpieces.
Andrew Ng is seeing the exact same thing happen in software engineering right now.
AI didn't replace the need to understand Computer Science. It made Computer Science the required vocabulary to control the AI.
If you don't understand how computers actually work, you are just typing "make a pretty app" into Cursor and shipping fragile, unscalable logic.
Here is Andrew Ng's exact hiring hierarchy today:
Level 1: 10 years of experience, but codes by hand (He won't hire them).
Level 2: Fresh college grad, but highly fluent in AI-assisted coding (He hires them over the 10-year veteran).
Level 3 (God Tier): Deeply understands CS fundamentals AND uses AI-assisted coding.
When humanity went from punch cards to keyboards, coding got easier, and more people coded. We are at that exact inflection point again.
AI doesn't replace fundamentals. It multiplies them.