Dustin Ogle

1.4K posts

Dustin Ogle

@DustinOgle33

Formerly $45k/mo online business owner. Building an Agent Harness inside a game. Watch AI debate and work. Average run ~2hr, ~1M tokens. https://t.co/8OkuSSMN73

Texas, USA เข้าร่วม Mayıs 2024

150 กำลังติดตาม76 ผู้ติดตาม

Dustin Ogle@DustinOgle33·3h

@kr0der GPT 5.5 was named weird to begin with. It was an entirely new pre-train run, with new research methods, etc. So it is possible 5.6 cooks. We also don't know the sizes of these models. Fable could be bigger, but 5.6 is better tuned.

English

410

Anthony Kroeger@kr0der·15h

there’s no way this is true because if it was, they wouldn’t do a 0.1 increment it’d be GPT 6 or similar

Jaysen ♨️@jp54362

GPT-5.6 rumors: • Beats Fable at agentic coding • 3x cheaper • Less censored • Flips sentiment from Anthropic to Opus tiny chance it arrives tomorrow next week is where things get interesting

English

178

26.8K

Dustin Ogle@DustinOgle33·4h

@notjazii Surprised to see this behind Qwen and deepseek. Haven't used those a ton, but so far it feels better.

English

J A Z I I@notjazii·22h

Chinese AI labs are shipping like crazy > Kimi K2.7 was released yesterday > GLM 5.2 was released a few hours ago and both are actually good open models are catching up way faster than most people expected already tested K2.7 waiting for GLM to be available in opencode so I can run the same tests drop a coding task you want me to test on both 👇

English

219

61.3K

Dustin Ogle@DustinOgle33·4h

@andrewqu This shouldn't really be a hot take. If you mean just by sheer percentage of tasks. But once in a while, you need to architect an entire plan/strategy. And for those situations, you want the smartest model possible.

English

117

Andrew Qu@andrewqu·8h

Hot take: a lot of people wouldn’t be able to tell the difference if they were randomly routed between gpt-5.5, opus-4.8, or fable-5 for their day to day work

English

236

1.1K

64.3K

Dustin Ogle@DustinOgle33·4h

@seconds_0 It is a much smaller model I agree it is benchmaxxed but still packs a punch.

English

728

0.005 Seconds (3/694)@seconds_0·16h

Quick vibecheck on benches last night - Kimi K2.7 is _really good_ - Minimax M3 is expensive, poorly engineered benchmaxxed and bad

English

448

30.1K

Dustin Ogle@DustinOgle33·4h

@buildwithhassan GLM 5.2 slaps. Excited to test kimi 2.7 soon.

English

182

Hassan@buildwithhassan·15h

my personal chinese coding model tier list from daily usage: tier 1: GLM-5 ≈ GLM-5.1 ≈ Kimi K2.6 > Qwen 3.7 Max (somehow the most expensive and not the best) tier 2: Qwen 3.7 Plus (under 200K context) > DeepSeek V4 Pro ≈ V4 Flash ≈ MiMo 2.5 Pro not there yet: MiniMax, Hunyuan testing K2.7 Code soon and GLM-5.2. ranking might change by morning.

English

364

28.5K

Dustin Ogle@DustinOgle33·4h

@llmdevguy Yeah I mean not to fanboy too hard but with a legacy pro plan, it's been breezing through stuff. Love the effort modes and so far it is quite good at coding/long-running agentic workflows.

English

717

Mateusz Mirkowski@llmdevguy·13h

🔥GLM 5.2 vs Kimi K2.7. Which one is better? Will test it soon. What's your thoughts?

English

279

34.7K

Dustin Ogle@DustinOgle33·10h

@buildwithhassan GLM 5.2 will probably make a run for top usage along with Minimax M3. DS is great at research and cheap prices, but just not the smartest.

English

541

Hassan@buildwithhassan·17h

opencode published their real model usage data. what developers actually run when they're paying for it: 1. deepseek v4 flash: 32T tokens 2. deepseek v4 pro: 19T tokens 3. kimi k2.6: 6.5T tokens deepseek is running more tokens than the next 16 models combined. it's actual usage from developers spending their own money. glm-5.1 grew 419% too. the models winning on price and reliability aren't always the ones winning on twitter.

English

395

25.7K

Dustin Ogle@DustinOgle33·19h

GLM 5.2 is gonna be one of the best daily models out there. Pair it with GPT/Opus as the orchestrator/planner and get incredible intelligence per dollar.

Z.ai@Zai_org

Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.

English

144

Dustin Ogle@DustinOgle33·1d

@bindureddy Actually very curious how it compares to composer.

English

448

Bindu Reddy@bindureddy·1d

LOL! Forget Fable 5 Just got our benchmark results Kimi 2.7 code is MIND BLOWING when it comes to agentic coding Will publish shortly!

English

80.5K

Dustin Ogle@DustinOgle33·1d

@elliotarledge @0xSero Oh, lmao, in that case then hell yeah. I got spooked from everyone saying how restricted it was on anything even close to RSI.

English

Elliot Arledge@elliotarledge·1d

@DustinOgle33 @0xSero this is single gpu kernels for consumer hardware. ive had ZERO refusals so far with it. im the author of kernelbench-hard and the results (including model solutions) are here: kernelbench.com/hard for example -- kernelbench.com/runs/20260610_…

English

0xSero@0xSero·1d

Wow. #3 on kernelbenchhard

Deutsch

180

25K

Dustin Ogle@DustinOgle33·1d

@elliotarledge @0xSero You sure? I imagine Fable is locked out from doing stuff like this.

English

Elliot Arledge@elliotarledge·1d

@0xSero 4 since fable release haha

English

Dustin Ogle@DustinOgle33·1d

@0xSero They did talk about self-improving a bunch in the 2.7 release. Haven't heard about it much for the M3 launch.

English

164

Dustin Ogle@DustinOgle33·1d

@TeksEdge I am gonna need some REAP, 2bit dynamic, ultra quants to run this on 128 gb lol

English

727

David Hendrickson@TeksEdge·1d

📊 With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428B MoE, ~23B active) 🔥 GGUF Size Estimates Q8_0 → ~430-450 GB Q6_K → ~340-360 GB Q5_K_M/XL → ~280-310 GB Q4_K_M/XL → ~220-250 GB (Best balance) Q3_K_XL → ~170-200 GB Q2_K → ~110-140 GB Last resort Very efficient due to extreme sparsity! Practical local runs will need high-VRAM setups (multiple 5090s or better).

ModelScope@ModelScope2022

MiniMax M3 is now open source! The model combines native multimodal understanding, ultra-long context, and Agent capabilities in one.🚀 New MSA architecture: up to 1M context at 1/20 the per-token compute of the previous gen. 9x faster prefilling, 15x faster decoding, on par with full attention on most tasks. Two versions 👇: MiniMax-M3 (full precision) and MiniMax-M3-MXFP8 (quantized, lower VRAM). 🤖 modelscope.ai/models/MiniMax… 🤖 modelscope.ai/models/MiniMax… 🧠 12hrs autonomous: reproduced an ICLR 2025 Outstanding Paper end to end, 18 commits + 23 experiment plots ⚡ 147 iterations, 9.4x CUDA speedup: FP8 matmul kernel on Hopper, peak utilization 7.6% → 71.3%, zero human intervention 🛠️ PostTrainBench: scored 37.1, ranking 3rd behind Opus 4.7 (42.4) and GPT-5.5 (39.3)

English

135

30.3K

Dustin Ogle@DustinOgle33·1d

@theo Basically, although Mythos has been out for two months now. He is already probably playing with Mythos 5.2

English

883

Theo - t3.gg@theo·2d

Do you think Karpathy joined Anthropic just so he could use Mythos for ML research without restrictions?

English

304

102

5.8K

444.7K

Dustin Ogle@DustinOgle33·1d

Amjad knows what's up. Loops are pretty basic. Dynamic workflows and orchestration is where it's at. Most companies will lean towards this soon 👇 I may have been a bit conservative with how Fable is going to be used in business, but you get the point. It will be for high-level strategic decisions.

Amjad Masad@amasad

I’ve been doing “loops” for a while now. I don’t do much traditional prompting. Most of my prompts are barely a sentence expressing an outcome. - my orchestrator prompts parallel agents - my computer use verifier gives it feedback - my security, production, and SEO agents generate prompts for fixes The industry is typically 3-6 months behind what we’re doing at Replit.

English

Dustin Ogle@DustinOgle33·1d

@JoelDeTeves If I had to guess, the big models are so RL'd on coding and math that they act more like autistic engineers and less like creatives/human sounding.

English

340

Joel - coffee/acc@JoelDeTeves·2d

I accidentally discovered that Gemma-4-26B-A4B is way better at writing human sounding content than every other model out there - including frontier models like GPT 5.5 and Sonnet 4.6. I'm not sure why this is - it's kind of crazy how slopified these big expensive models are and for some reason, Google's open source model sounds a lot more natural and follows writing instructions better. WTF?

English

797

54.2K

Dustin Ogle@DustinOgle33·1d

You just need Fable to make a few high-level decisions. Then Opus/GPT fills in the details, makes the plan, and orchestrates a bunch of Kimi/GLM agents to do the tasks. Something like this:

Riley Brown@rileybrown

Hiring Fable on API pricing Full time (40 hrs / wk) is: $1,248,000 / year wow.

English

Dustin Ogle@DustinOgle33·1d

@Elaina43114880 If glm can hold it down to 700B parameters then it will be the best for 512 parameters. But they might increase the size too.

English

1.3K

Elaina@Elaina43114880·1d

We're about to face a new problem: On a single Mac Studio 512GB, after different levels of quantization, which one actually ends up being the best model — MiniMax M3, Kimi K2.7 Code, or GLM-5.2?

Elaina@Elaina43114880

Big day for Mac Studio 512GB players! 🥳 MiniMax M3 and Kimi K2.7 Code just dropped as two powerful new open weights models. Both will still need some quantization to run comfortably on a single machine, but believe the community is already working on it! 😻 (btw GLM-5.2 is on the way 😸)

English

190

29.7K

Dustin Ogle@DustinOgle33·1d

Sadly, going to be hard to run this locally, but major props to Minimax! Pound for pound, one of the best models out there. This is going to be amazing for a workhorse model once a frontier model has scoped and spec'ed out the plan.

MiniMax (official)@MiniMax_AI

MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Mini… MiniMax Sparse Attention: huggingface.co/papers/2606.13…

English

Dustin Ogle@DustinOgle33·2d

@outsource_ @TesanaAI Does this let you use AI NPC's? Where the AI can control the actions on the back end, and the UI shows the character doing it?

English

Eric ⚡️ Building...@outsource_·2d

GUYS this is not a drill @TesanaAI is AMAZING for game development. The task: Rebuild a prototype of HermesWorld The platform generated assets, loading screens, music, characters and landed this. One prompt! After a few revisions this will be incredible 🔥

English

3.4K

ค้นพบ

@kr0der @notjazii @andrewqu @seconds_0 @buildwithhassan @llmdevguy @bindureddy @elliotarledge