DJ Naydee

2.6K posts

DJ Naydee banner
DJ Naydee

DJ Naydee

@BeatsPlanet

Multi-Gold Music Producer & Beats Planet LLC CEO

New York City شامل ہوئے Ekim 2009
30 فالونگ3.6K فالوورز
DJ Naydee
DJ Naydee@BeatsPlanet·
@ornith_ The only bench that matters was left out, deepswe. I wonder why
English
1
0
1
76
Ornith
Ornith@ornith_·
Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: deep-reinforce.com/ornith_1_0.html 🤗Huggingface: huggingface.co/collections/de…
Ornith tweet media
English
478
993
6.6K
5.2M
DJ Naydee
DJ Naydee@BeatsPlanet·
@awakecoding @paradite_ Open router actually has the cache hit rate reported for each provider and model. You just have to hunt around for it and scroll down. That's how I really started paying attention to it.
English
0
0
2
26
Marc-André Moreau
Marc-André Moreau@awakecoding·
@BeatsPlanet @paradite_ Which one would you recommend for better caching? I tried GLM 5.2 today on OpenRouter and got a poor cache hit rate below 50%. I like the model but next I want to find the right provider that does caching better
English
1
0
0
41
Zhu Liang
Zhu Liang@paradite_·
the number of providers for glm 5.2 is insane. i count 20 of them.
Zhu Liang tweet media
English
50
34
537
111.4K
DJ Naydee
DJ Naydee@BeatsPlanet·
@argofowl I just checked and I now have 2 resets banked, and I haven't referred anyone. Thats on top of the free reset we all got a day or so ago. Let the good times roll!
English
0
0
1
15
🥔🥔🥔
🥔🥔🥔@argofowl·
❗❗❗ guys remember this post about codex rate limit resets "on your own time"? well apparently this is some bullshit that is only bankable when you refer people and they sign up for codex tibo's last reset auto-applied i didn't need a reset right now, i had 50% usage in reserve and my reset was tomorrow i could have /fast on xhigh all day and still had a full reset tomorrow but now they forced a reset i didn't need as if it's some reward some anthropic level marketing ngl i was so happy because i thought every reset would be bankable so we could use it when we wanted, on our own time i hate this so much
OpenAI@OpenAI

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

English
157
29
672
526.8K
Anastasios Nikolas Angelopoulos
Anastasios Nikolas Angelopoulos@ml_angelopoulos·
Just to be clear, if you remove Fable which is unavaialble, GLM-5.2 (Max) is the #1 model in the world for frontend coding. This is a huge moment. OSS has caught up with proprietary, and China has caught up with the US, in this very important domain.
Arena.ai@arena

Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2 React and #4 HTML sub-leaderboards - Ranks as the top model in nearly all sub categories: Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Product, Gaming, and Simulations. Congrats @Zai_org for the incredible milestone!

English
146
358
4.3K
594.6K
DJ Naydee
DJ Naydee@BeatsPlanet·
@PlarkoD @aisearchio That's because to this point all of the open source models were showing their score of only the typical cheated on benchmarks. With DeepSWE, they can't cheat
English
0
0
1
12
Youcef
Youcef@PlarkoD·
@aisearchio Didn't know that Qwen 3.7 max is this low they rlly joined the hype train saying it's a cutting edge one
English
1
0
1
588
DJ Naydee
DJ Naydee@BeatsPlanet·
@haider1 DeepSWE is the only bench that maters, and a 46 is very impressive for open source, kudos. If those numbers are real.
English
1
1
5
480
Haider.
Haider.@haider1·
one of the best days yet for open source as i've said repeatedly, it's only a matter of time before a Chinese model reaches fable-level capabilities the US has shot itself in the foot with the Mythos ban, because it will hurt Western AI innovation while doing very little to slow anyone else down
Haider. tweet media
English
17
12
216
10.9K
DJ Naydee
DJ Naydee@BeatsPlanet·
@bygregorr @aronprins I was thinking the same thing they didn't exactly say they weren't going to bring it back but they did provide a hint. It sounds like they're going to alter usage in the plans at some point in order to continue supporting programmatic usage.
English
1
0
1
64
Gregor
Gregor@bygregorr·
@aronprins might be wrong but 'pausing' usually just means 'we'll do this more quietly in a few weeks.' did they give a reason or any timeline, or is it actually indefinite?
English
3
0
27
5.4K
Aron Prins
Aron Prins@aronprins·
Breaking News: Claude is pausing the Agent SDK credit change!
Aron Prins tweet media
English
69
71
938
415.3K
DJ Naydee
DJ Naydee@BeatsPlanet·
@hqmank No option to do that yet. I just checked, and my -p is still working fine.
English
0
0
2
162
DJ Naydee
DJ Naydee@BeatsPlanet·
@elvissun So far mine is still working with -p
English
0
0
0
19
Elvis
Elvis@elvissun·
quick reminder not only did we lose Fable 5, we are also losing claude -p on June 15th dark times
English
12
1
117
41.8K
DJ Naydee
DJ Naydee@BeatsPlanet·
@jun_song hahahah Thanks for the laugh, buddy.
English
0
0
1
13
Jun Song
Jun Song@jun_song·
After my agent testing, seems like Kimi-K2.7 is better than Opus-4.8. It is closer to Fable level. My recent impression : Fable > Kimi-2.7 > Opus-4.8 = GLM-5.2 > GPT5.5 > Minimax-M3
English
324
234
5K
477.7K
Kimi.ai
Kimi.ai@Kimi_Moonshot·
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai
Kimi.ai tweet mediaKimi.ai tweet media
English
643
1.8K
14K
2.5M
DJ Naydee
DJ Naydee@BeatsPlanet·
@1slimewell Looks like you didn't get the memo 😂, that fable is not going to be included in a subscription plan, after two weeks.
English
0
0
1
149
DJ Naydee
DJ Naydee@BeatsPlanet·
@pupposandro Yup, I Agree. "Its insane" How easy it is to cheat on SWE-Bench. Lets see the real numbers, DEEPSWE.. Qwen won't score better than 15%
English
1
0
0
1.1K
Sandro
Sandro@pupposandro·
This is still insane...
Sandro tweet media
English
212
325
10.1K
1.3M
DJ Naydee
DJ Naydee@BeatsPlanet·
@kilocode Lol yeah but try asking mini Max to follow a step by step protocol or procedure other than coding and it all goes into the toilet. It has the attention span of gen Zers
English
0
0
1
437
Kilo
Kilo@kilocode·
We gave the same code audit to Claude Opus 4.8 and MiniMax M3. Same codebase. Same prompt. 17 known bugs planted in advance. MiniMax M3 caught 13 of them for $0.07. The cheapest Claude run caught the same 13 for $1.30. Here's the breakdown. 🧵
Kilo tweet media
English
77
115
1.4K
260K
DJ Naydee
DJ Naydee@BeatsPlanet·
@ghostloadgg @theo @MiniMax_AI @datacurve It's a new LLM benchmark that can't easily be gamed just like all of the other ones are nowadays, which is why the open source models keep doing so well on them
English
0
0
0
17
MiniMax (official)
MiniMax (official)@MiniMax_AI·
Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days
MiniMax (official) tweet media
English
565
1.2K
11.8K
5.2M
DJ Naydee
DJ Naydee@BeatsPlanet·
@theo @MiniMax_AI @datacurve Deepswe Is now the only benchmark I care about. It was absolutely spot on as it reflected my exact experience when using each of the models, especially some of the open source models claiming to be sota, but when using them they've completely fall apart as agents
English
1
0
5
721
DJ Naydee
DJ Naydee@BeatsPlanet·
@theo This is fake right? How on earth is the #1 coding gpt 5.5 model way down in the 11th spot?
English
0
0
0
123
Theo - t3.gg
Theo - t3.gg@theo·
Since we're talking about good code benches today, here's a shitty one for reference
Arena.ai@arena

Qwen3.7 Max (20250517) debuts at #4 in Code Arena: Frontend - the top-ranked Chinese lab on the board, surpassing GLM-5.1 and is now on par with Claude Opus 4.6 on agentic web development tasks. Huge congrats to @Alibaba_Qwen on this achievement!

English
66
12
889
113.5K
Tibo
Tibo@thsottiaux·
Our master plan is to release better and more efficient models. And also to release better products, week after week. Oh and get more compute too. Together with spending too much time on x. How good is this plan?
English
527
83
4.3K
224.4K
DJ Naydee
DJ Naydee@BeatsPlanet·
@thsottiaux Anthropic has been "borrowing " from everyone this year. 5 days after banning open claw they dropped their own locked up version "Managed Agents"
English
0
0
0
347