Lisan al Gaib

18.9K posts

Lisan al Gaib banner
Lisan al Gaib

Lisan al Gaib

@scaling01

lead them to paradise https://t.co/IiP4VZlGU3

Beigetreten Ağustos 2024
988 Folgt38.2K Follower
Angehefteter Tweet
Lisan al Gaib
Lisan al Gaib@scaling01·
My predictions for 2026: Coding and Mathematics AGI - METR 50% time horizons above 24 hours - my mean estimate is 30.8 hours, 2 day time horizons possible within frontier labs when accounting for 60 day lag - if 2025 was the year of agents, then 2026 will be the year of multi-agent systems - agents delegating work to subagents -> the start of the agent economy and the great unhobbling! Most of our current math and coding benchmarks will get saturated! - Epoch Capabilities Index ( > 175 ) - FrontierMath Levels 1-3 ( > 95% ) - ARC-AGI 1 and 2 ( > 95% ) - SimpleQA verified ( > 95% ) - Simple-Bench ( > 90% ) - SWE-Bench-verified ( > 90% ) - Terminal-Bench 2 ( > 90% ) - WeirdML v2 ( > 85% ) - Humanities Last Exam ( > 80% ) - FrontierMath Level 4 ( > 75% ) - Cybench ( > 70% ) - GDPval ( > 70 % win rate, no ties) - GSO ( > 65% ) - ARC-AGI-3 ( > 60% and > 80% if they go for o3-preview comparable compute budgets or continual learning breakthrough happens) - more evals like gdpval that capture economic value of models and systems - big focus white collar work and large acceleration of science: specifically i see acceleration in medicine, biology, chemistry, finance, legal, administrative work - automation of white collar work will be enabled by having reliable and fast computer use agents - reliable computer use agents will also have implications for how you use the internet. this is OpenAI's big goal: become the hub to the internet and delegate shopping and whatever to agents! Big models launches to get hyped for in 2026: - Claude 5 - Claude 5.5 - Gemini 3.5 - Gemini-4 - GPT-5.3 - GPT-6 (everything in between possible, but Gemini 4 ~ 80%, Claude 5.5 ~ 70%, GPT-6 ~ 60% likely before 2027) - DeepSeek-V4 - Grok-5 - Qwen-4 - Kimi-K3, GLM-5, MiniMax M3 - more korean models and a bunch of american open-source models :) The gap between closed and open labs will narrow in H1 2026 due to DeepSeek-V4, then widen in the later half of the year, especially on economically valuable tasks. Closed models will be much more reliable. But we will still have Opus 4.5+ level open models by the end of 2026. Most frontier models will be around 5-10T params. If we see GPT-6 and Gemini-4 at the end of 2026 10T+ param models are possible. These models + harnesses will be the first not research agents. We should also see much better live models with voice and video mode. Model architecture: - we will see both, more efficient architectures and more expressive architectures! - hybrid architectures for even longer context windows, diffusion models for speed on edge devices, but also models that double down on full attention or even more expressive attention mechanisms - looped language models, other recurrent architectures and continual learning will enable much smaller reasoning models! (TRM on ARC-AGI has paved the way for the reasoning core) - big improvements in reasoning efficiency in my 2025 prediction I included a prediction for 2026 that I stand by: - "someone (Anthropic) figures out efficient test-time-training [...], this will be the next paradigm for 2026 and lead to superintelligence" General outlook and some random thoughts: - it will be clear to everybody that Anthropic has the mandate and is ahead of everyone else - OpenAI, Anthropic and Google will remain frontier labs - decent chance that Anthropic overtakes OpenAI's valuation and both are valued > 1T - DeepSeek will join them with V4 as THE chinese frontier lab - xAI will likely repeat Grok-4, Grok-5 will be great on benchmarks but Elon persists on slop-maxxing the model - AI generated video content will take off with Veo-4 and Sora-3, consistent minute long videos will be possible - embodied intelligence will start to take off by RL through world models - full self-driving solved, waymo and tesla everywhere - the stock market will have a 20%+ drawdown - 15% chance of OpenAI going bankrupt and getting acquired by Microsoft due to collapse of oracle or a market crash, caused by rapidly deteriorating economic situation (unemployment, inflation) - push against AI will become a common theme in most advanced western economies as unemployment rises - populist right wing parties continue to gain traction in europe - trump/republicans will lose midterm elections
English
58
82
723
248K
Lisan al Gaib
Lisan al Gaib@scaling01·
that looks pretty fucking good
Lisan al Gaib tweet media
English
16
7
310
43.9K
Lisan al Gaib
Lisan al Gaib@scaling01·
@cormac_mars for me it's like: avatar 1: 4/5 avatar 2: 3/5 avatar 3: 3/5 dune 1: 4/5 dune 2: 5/5 dune 3: 10/5
हिन्दी
1
0
7
1.7K
Cormac
Cormac@cormac_mars·
@scaling01 avatar 3 was 5 stars and 1 was 4.5 for me dune 2 was obv a 5 and dune 1 was 4
English
1
0
2
1.9K
Lisan al Gaib
Lisan al Gaib@scaling01·
it took villeneuve 5 years to create the greatest (sci-fi) trilogy of all time meanwhile, james cameron got a billion dollars and 16 fricking years to create 3 mid movies slop is not only AI related. it exists everywhere in the real world
English
90
80
1.5K
122K
Lisan al Gaib
Lisan al Gaib@scaling01·
@Presidentlin I have become enlightened and humble. I see you like or comment every day. You deserve a follow my friend.
English
1
0
3
152
Lincoln 🇿🇦
Lincoln 🇿🇦@Presidentlin·
I collected a new tazos.
Lincoln 🇿🇦 tweet media
English
1
0
3
1.1K
OpenRouter
OpenRouter@OpenRouter·
Stealth Model Reveal: Hunter and Healer Alpha are @XiaomiMiMo MiMo-V2-Pro and MiMo-V2-Omni Both models are live now on OpenRouter, and free to use in @OpenClaw via the OpenRouter provider for the next week!
OpenRouter tweet media
English
67
126
1.4K
99.1K
David Sepulvado
David Sepulvado@david_sepulvado·
@scaling01 2 and 3 were reductive yes, but I doubt you really think that for the first 😁
English
1
0
12
2.9K
Lisan al Gaib
Lisan al Gaib@scaling01·
@xpasky gonna run lisanbench for GPT-5.4, Mistral Small 4 and M2.7 this weekend
English
1
0
12
670
Lisan al Gaib retweetet
Cem Karsan 🥐
Cem Karsan 🥐@jam_croissant·
The width 📏 b/w PINK👛 & YELLOW🟡= LEVERAGE This is the true barometer 🌡️ of success for this administration…
Cem Karsan 🥐 tweet media
Cem Karsan 🥐@jam_croissant

1) Who do you think The US’s 🇺🇸 war in Iran 🇮🇷 is actually against??? 2) Given that… What do you see as the US’s 🇺🇸 #1 & #2 greatest sources of leverage in this Grand War ⚔️??? 3) Are those 2 sources of leverage somehow connected 🪢 to 1 another??? 4) Given that, why might the Strait of Hormuz 🚢 be the most critical front of this Grand War⚔️??? 5) Finally… Why might that mean that this conflict likely won’t (actually) be "very complete, pretty much," for many years??? 🤷‍♂️ #🥐RUMBS. . . . . . . . . . .

English
24
24
243
81.6K
Lisan al Gaib
Lisan al Gaib@scaling01·
lisan never forgets comments
English
0
0
18
1.9K
Zixuan Li
Zixuan Li@ZixuanLi_·
Me introducing M2.7💯
Zixuan Li tweet media
English
34
18
743
31.2K
aislop
aislop@lazysloth·
@scaling01 oof an't gonna happen, all the other pictures had luscious hair, one of them is diff
English
2
0
0
118
Lisan al Gaib
Lisan al Gaib@scaling01·
OpenAI just released "Parameter Golf" a new challenge to train the best language model that fits in a 16MB artifact and trains in under 10 minutes on 8xH100s There's also a leaderboard. If you perform well they might hire you The challenge is open from March 18th to April 30th
Lisan al Gaib tweet media
English
9
13
262
20.4K