Research from Prof Julian Togelius found that despite AI's well-documented victories in chess, Go, and Atari games, humans still learn unfamiliar video games far faster than any AI model.
#NYUTandonMadebuff.ly/fbVkDrh
GPT-5.5 & Opus 4.7 on ARC-AGI-3
- GPT-5.5: 0.43%
- Opus 4.7: 0.18%
We found 3 failure modes:
- True local effect, false world model
- Wrong level of abstraction from training data
- Solved the level, didn’t reinforce the reward
See our full analysis 🧵
How do people seek guidance from Claude?
We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview.
anthropic.com/research/claud…
ARC-AGI-3 testing is done for gpt-5.5 and opus 4.7
Now we’re in analysis mode going through the logs
It’s pretty clear where the failure modes are for each model
"post-AGI, no one is going to work and the economy is going to collapse"
"i am switching to polyphasic sleep because GPT-5.5 in codex is so good that i can't afford to be sleeping for such long stretches and miss out on working"
Uncontrolled AI poses a severe danger to all of humanity.
On Wednesday, I'll be hosting a discussion with leading AI scientists from the US and China about the need for international cooperation against this existential threat. This is an enormously important issue. Join us.
1. We believe in iterative deployment; although GPT-5.5 is already a smart model, we expect rapid improvements. Iterative deployment is a big part of our safety strategy; we believe the world will be best equipped to win at the team sport of AI resilience this way.
2. We believe in democratization. We want people to be able to use lots of AI; we aim to have the most efficient models, the most efficient inference stack, and the most compute. We want our users to have access to the best technology and for everyone to have equal opportunity. We have been tracking cybersecurity as a preparedness category for a long time, and have built mitigations we believe in that enable us to make capable models broadly available.
3. We love you and we want you to win. We want to be a platform for every company, scientist, entrepreneur, and person. (My whole career has largely been about the magic of startups, and I think we are about to see that magic at hyperscale.)
@tacowasa2nd its fine. all countries had bad and good people. It's better we see an honest representation of the good and bad parts of foreign cultures, rather than idealized views
not a single person i have ever spoken to uses gemini for coding.
this is still very very weird.
why is gemini so bad at coding when google has scoured the web full of code for decades?
@bitcloud strawman. i think llms might be conscious and i also think the term "neural network" is a bad metaphor because they're nothing like actual neurons