DJ Naydee

2.6K posts

DJ Naydee

@BeatsPlanet

Multi-Gold Music Producer & Beats Planet LLC CEO

New York City شامل ہوئے Ekim 2009

30 فالونگ3.6K فالوورز

DJ Naydee@BeatsPlanet·2d

@ornith_ The only bench that matters was left out, deepswe. I wonder why

English

Ornith@ornith_·3d

Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: deep-reinforce.com/ornith_1_0.html 🤗Huggingface: huggingface.co/collections/de…

English

478

993

6.6K

5.2M

DJ Naydee@BeatsPlanet·4d

@awakecoding @paradite_ Open router actually has the cache hit rate reported for each provider and model. You just have to hunt around for it and scroll down. That's how I really started paying attention to it.

English

Marc-André Moreau@awakecoding·4d

@BeatsPlanet @paradite_ Which one would you recommend for better caching? I tried GLM 5.2 today on OpenRouter and got a poor cache hit rate below 50%. I like the model but next I want to find the right provider that does caching better

English

Zhu Liang@paradite_·5d

the number of providers for glm 5.2 is insane. i count 20 of them.

English

537

111.4K

DJ Naydee@BeatsPlanet·18 Haz

@argofowl I just checked and I now have 2 resets banked, and I haven't referred anyone. Thats on top of the free reset we all got a day or so ago. Let the good times roll!

English

🥔🥔🥔@argofowl·18 Haz

❗❗❗ guys remember this post about codex rate limit resets "on your own time"? well apparently this is some bullshit that is only bankable when you refer people and they sign up for codex tibo's last reset auto-applied i didn't need a reset right now, i had 50% usage in reserve and my reset was tomorrow i could have /fast on xhigh all day and still had a full reset tomorrow but now they forced a reset i didn't need as if it's some reward some anthropic level marketing ngl i was so happy because i thought every reset would be bankable so we could use it when we wanted, on our own time i hate this so much

OpenAI@OpenAI

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

English

157

672

526.8K

DJ Naydee@BeatsPlanet·17 Haz

@ml_angelopoulos

GIF

QME

Anastasios Nikolas Angelopoulos@ml_angelopoulos·16 Haz

Just to be clear, if you remove Fable which is unavaialble, GLM-5.2 (Max) is the #1 model in the world for frontend coding. This is a huge moment. OSS has caught up with proprietary, and China has caught up with the US, in this very important domain.

Arena.ai@arena

Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2 React and #4 HTML sub-leaderboards - Ranks as the top model in nearly all sub categories: Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Product, Gaming, and Simulations. Congrats @Zai_org for the incredible milestone!

English

146

358

4.3K

594.6K

DJ Naydee@BeatsPlanet·17 Haz

@PlarkoD @aisearchio That's because to this point all of the open source models were showing their score of only the typical cheated on benchmarks. With DeepSWE, they can't cheat

English

Youcef@PlarkoD·17 Haz

@aisearchio Didn't know that Qwen 3.7 max is this low they rlly joined the hype train saying it's a cutting edge one

English

588

⚡AI Search⚡@aisearchio·16 Haz

Check out its DeepSWE score. Insane lead by GLM 5.2

⚡AI Search⚡@aisearchio

Model weights for GLM 5.2 are out! By far the best open model. Full review coming tonight huggingface.co/zai-org/GLM-5.2

English

288

54.1K

DJ Naydee@BeatsPlanet·16 Haz

@haider1 DeepSWE is the only bench that maters, and a 46 is very impressive for open source, kudos. If those numbers are real.

English

480

Haider.@haider1·16 Haz

one of the best days yet for open source as i've said repeatedly, it's only a matter of time before a Chinese model reaches fable-level capabilities the US has shot itself in the foot with the Mythos ban, because it will hurt Western AI innovation while doing very little to slow anyone else down

English

216

10.9K

DJ Naydee@BeatsPlanet·15 Haz

@bygregorr @aronprins I was thinking the same thing they didn't exactly say they weren't going to bring it back but they did provide a hint. It sounds like they're going to alter usage in the plans at some point in order to continue supporting programmatic usage.

English

Gregor@bygregorr·15 Haz

@aronprins might be wrong but 'pausing' usually just means 'we'll do this more quietly in a few weeks.' did they give a reason or any timeline, or is it actually indefinite?

English

5.4K

Aron Prins@aronprins·15 Haz

Breaking News: Claude is pausing the Agent SDK credit change!

English

938

415.3K

DJ Naydee@BeatsPlanet·15 Haz

@hqmank No option to do that yet. I just checked, and my -p is still working fine.

English

162

Kai@hqmank·15 Haz

PSA: If you use Hermes, OpenClaw, claude -p, or anything built on the Claude Agent SDK, check your Claude account and claim the dedicated credit. Claude paid users can claim it starting June 15.

ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English

8.7K

DJ Naydee@BeatsPlanet·15 Haz

@elvissun So far mine is still working with -p

English

Elvis@elvissun·13 Haz

quick reminder not only did we lose Fable 5, we are also losing claude -p on June 15th dark times

English

117

41.8K

DJ Naydee@BeatsPlanet·15 Haz

@jun_song hahahah Thanks for the laugh, buddy.

English

Jun Song@jun_song·14 Haz

After my agent testing, seems like Kimi-K2.7 is better than Opus-4.8. It is closer to Fable level. My recent impression : Fable > Kimi-2.7 > Opus-4.8 = GLM-5.2 > GPT5.5 > Minimax-M3

English

324

234

477.7K

DJ Naydee@BeatsPlanet·12 Haz

@Kimi_Moonshot Lets see the DeepSWE bench, then we can talk.

English

782

Kimi.ai@Kimi_Moonshot·12 Haz

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English

643

1.8K

14K

2.5M

DJ Naydee@BeatsPlanet·11 Haz

@1slimewell Looks like you didn't get the memo 😂, that fable is not going to be included in a subscription plan, after two weeks.

English

149

maxwell@1slimewell·10 Haz

Dude switches every model release

BridgeMind@bridgemindai

I just cancelled my $200 ChatGPT Pro 20x plan. GPT 5.5 is garbage compared to Claude Fable 5. OpenAI shouldn't even release GPT 5.6. They are going to need to release GPT 6 or a new class of model to come anywhere close to Claude Fable 5. I am buying a 4th $200 Claude Max subscription now.

English

2.7K

99.3K

DJ Naydee@BeatsPlanet·11 Haz

@pupposandro Yup, I Agree. "Its insane" How easy it is to cheat on SWE-Bench. Lets see the real numbers, DEEPSWE.. Qwen won't score better than 15%

English

1.1K

Sandro@pupposandro·10 Haz

This is still insane...

English

212

325

10.1K

1.3M

DJ Naydee@BeatsPlanet·7 Haz

@kilocode Lol yeah but try asking mini Max to follow a step by step protocol or procedure other than coding and it all goes into the toilet. It has the attention span of gen Zers

English

437

Kilo@kilocode·6 Haz

We gave the same code audit to Claude Opus 4.8 and MiniMax M3. Same codebase. Same prompt. 17 known bugs planted in advance. MiniMax M3 caught 13 of them for $0.07. The cheapest Claude run caught the same 13 for $1.30. Here's the breakdown. 🧵

English

115

1.4K

260K

DJ Naydee@BeatsPlanet·1 Haz

@ghostloadgg @theo @MiniMax_AI @datacurve It's a new LLM benchmark that can't easily be gamed just like all of the other ones are nowadays, which is why the open source models keep doing so well on them

English

ghostload@ghostloadgg·1 Haz

@BeatsPlanet @theo @MiniMax_AI @datacurve what’s deepswe

English

MiniMax (official)@MiniMax_AI·1 Haz

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days

English

565

1.2K

11.8K

5.2M

DJ Naydee@BeatsPlanet·1 Haz

@theo @MiniMax_AI @datacurve Deepswe Is now the only benchmark I care about. It was absolutely spot on as it reflected my exact experience when using each of the models, especially some of the open source models claiming to be sota, but when using them they've completely fall apart as agents

English

721

Theo - t3.gg@theo·1 Haz

@MiniMax_AI @datacurve you know the drill

English

308

17.6K

DJ Naydee@BeatsPlanet·28 May

@theo This is fake right? How on earth is the #1 coding gpt 5.5 model way down in the 11th spot?

English

123

Theo - t3.gg@theo·27 May

Since we're talking about good code benches today, here's a shitty one for reference

Arena.ai@arena

Qwen3.7 Max (20250517) debuts at #4 in Code Arena: Frontend - the top-ranked Chinese lab on the board, surpassing GLM-5.1 and is now on par with Claude Opus 4.6 on agentic web development tasks. Huge congrats to @Alibaba_Qwen on this achievement!

English

889

113.5K

DJ Naydee@BeatsPlanet·23 May

@MatthewBerman @thsottiaux Shhhhh bro. You're messing it up for us, lol those weekly resets have been pure 🪙

English

Matthew Berman@MatthewBerman·23 May

@thsottiaux You should also make money

English

3.1K

Tibo@thsottiaux·23 May

Our master plan is to release better and more efficient models. And also to release better products, week after week. Oh and get more compute too. Together with spending too much time on x. How good is this plan?

English

527

4.3K

224.4K

DJ Naydee@BeatsPlanet·12 May

@thsottiaux Anthropic has been "borrowing " from everyone this year. 5 days after banning open claw they dropped their own locked up version "Managed Agents"

English

347

Tibo@thsottiaux·12 May

The master becomes the mentee. At last, Claude is now copying Codex. But you cannot out-accelerate GPT-5.5.

Daniel San@dani_avila7

Claude Code 2.1.139 added /goal You set a completion condition and Claude keeps working across turns until it's met Works in interactive, -p, and Remote Control 👏

English

366

149

4.4K

427.6K

دریافت کریں

@ornith_ @awakecoding @paradite_ @argofowl @ml_angelopoulos @PlarkoD @aisearchio @haider1