Ismail Elsherbini

223 posts

Ismail Elsherbini banner
Ismail Elsherbini

Ismail Elsherbini

@elsherbin_

Co-Founder @ VERSO

New York, USA Katılım Mart 2024
213 Takip Edilen46 Takipçiler
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@teortaxesTex Why would Elon lie about opus and sonnet sizes also research looking at Google vertex esitmated opus to be around 3t to 2t and sonnet 1t, anyway GLM is definitely closer to sonnet parameter count then opus, it's not a fair comparison I agree china is behind but not 7 months
English
1
0
1
387
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
I think GLM 5.2 points to a 7 months gap currently It's around Opus 4.7-4.8 level, all told (modulo vision which in Opus's case is garbage anyway). Mythos reached Preview status (≥ Opus 4.8, functionally) by early Feb 2026. This means full PRC Mythos ("Fable") by Nov-Dec'26.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Lunexa@Lunexalith

@teortaxesTex What's your current timeline for china to reach Fable class ? GLM-5.2 certainly shorten the gap.

English
57
79
1.2K
442.5K
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@teortaxesTex Sonnet is 1 trillion parameters, you think opus is 500 billion parameters larger then sonnet ? Elon tweeted it's around 5 trillion opus class models , I doubt he is far off from the real parameter count
English
1
0
1
502
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@teortaxesTex GLM 5.2 is a massive jump in opensource, compared to a similar sized model like sonnet 4.6 . It's insane release regardless
English
0
0
0
94
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@teortaxesTex "7 months behind " GLM 5.2 achieves opus level inteligence with a model 5 to 3 times smaller , Mythos models are rumored to be massive , even 10T parameters, if china achieved mythos level inteligence with a model 10 smaller Antropic and Openai would be out of business today.
English
2
0
11
2.7K
LM Studio
LM Studio@lmstudio·
For WWDC, we worked with Apple to run Kimi K2.6, a 1T-parameter model, across a cluster of four Mac Studios using a preview version of LM Studio. We showcased secure remote access from a MacBook Neo and iPhone using LM Link. A glimpse of your own private, frontier-scale AI.
LM Studio tweet mediaLM Studio tweet mediaLM Studio tweet media
English
120
301
4.3K
350.6K
Juraj Bednar
Juraj Bednar@jurbed·
@bindureddy The problem is it often answers in Chinese. And I can't read Chinese
English
2
0
3
1.2K
Bindu Reddy
Bindu Reddy@bindureddy·
GLM 5.2 Is Mind Blowingly Good On Benchmarks Yes, it even beats Opus 4.8 and GPT 5.5. on some of them However it is also bench-maxxed! Internal evals have it behind them 😼 STILL - A HUGE WIN FOR OPEN SOURCE AI
English
48
17
370
20.9K
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@oleksoleksoleks Did you try to use it inside their coding agent app, they give 1.5x quota , I tried it and it's quite good
English
1
0
3
238
Olek
Olek@oleksoleksoleks·
Z.ai GLM-5.2 via Lite sub 45min long-horizon task @ 40 tok/s before hitting 5h limit 16mil cached, 200k input, 100k output Off peak hours (02:00 - 06:00 EST)
Olek tweet media
English
5
0
46
5.2K
Ismail Elsherbini retweetledi
Vercel
Vercel@vercel·
Introducing eve, an agent framework. 𝚊𝚐𝚎𝚗𝚝/ 𝚊𝚐𝚎𝚗𝚝.𝚝𝚜 𝚒𝚗𝚜𝚝𝚛𝚞𝚌𝚝𝚒𝚘𝚗𝚜.𝚖𝚍 𝚝𝚘𝚘𝚕𝚜/ 𝚜𝚔𝚒𝚕𝚕𝚜/ 𝚜𝚊𝚗𝚍𝚋𝚘𝚡/ 𝚜𝚌𝚑𝚎𝚍𝚞𝚕𝚎𝚜/ Like Next.js, for agents. vercel.com/blog/introduci…
English
318
711
7.1K
2.1M
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@theo What about Gpt 5.6 if they took down fable , will they take down openai models too
English
1
0
1
439
Theo - t3.gg
Theo - t3.gg@theo·
It's kind of wild that Fable still isn't back. Honestly thought this would be resolved quicker 🙃
English
225
40
3.6K
173K
Wes Bos
Wes Bos@wesbos·
xiaomi - the Chinese company that makes phones, rice cookers and electric vehicles - has forked OpenCode
Wes Bos tweet media
English
128
98
3.3K
224.6K
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@ggg78g89 @crystalsssup It's from my experience better than opus 4.6 and near 4.7 level of reliability, it's expensive and not token efficient though , opus 4.8 medium is better and cheaper
English
0
0
0
28
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@bridgemindai i reached my limit twice in half a prompt on the 20 USD plan , its not token usage based, its request based , these ai agents looping even a small request with 2 token consumption , consumes 1 request , its awful
English
0
0
1
632
BridgeMind
BridgeMind@bridgemindai·
The usage limits on the Kimi K2.7 Code plan are TERRIBLE. I got rate limited after only 30 minutes of testing. Isn't this model supposed to be cheap?
BridgeMind tweet media
English
69
7
431
32.6K
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@tyleryust Hey man, I hope to see Kimi k2.7 Code and glm 5.2, I am curious to see the progress of these models from Chinese labs
English
0
0
0
103
Tyler Yust
Tyler Yust@tyleryust·
DeepSWE is now the main SWE benchmark on Artificial Analysis replacing SWE bench pro. really proud of the team
Artificial Analysis@ArtificialAnlys

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

English
2
0
21
1.1K
Hamza
Hamza@thegenioo·
@jumperz what’s that projection
English
1
0
2
232
JUMPERZ
JUMPERZ@jumperz·
kimi k2.7-code just dropped, on DeepSWE k2.6 sits at 24% the top open-weight model on the board minimax , Qwen and GLM when we have deepseek V4-Pro at 8% and collapses on real long-horizon work and that's Kimi's actual bet not price tho if K2.7's +21.8% coding claim holds, that's ~29% on DeepSWE.. enough to flip gemini-3.5-flash (28%).. lets see
JUMPERZ tweet media
Kimi.ai@Kimi_Moonshot

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English
28
12
330
38.3K
Ismail Elsherbini
Ismail Elsherbini@elsherbin_·
@teortaxesTex Composer on deepswe is worse than Kimi 2.6 , but it's more token efficient so , I will wait for Kimi 2.7 deepswe results , I like moonshot work , I hope its a good model release
Ismail Elsherbini tweet media
English
0
0
15
1.5K
Ismail Elsherbini retweetledi
Claude
Claude@claudeai·
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
English
5K
14.5K
105K
56.3M