Dustin Ogle

1.4K posts

Dustin Ogle banner
Dustin Ogle

Dustin Ogle

@DustinOgle33

Formerly $45k/mo online business owner. Building an Agent Harness inside a game. Watch AI debate and work. Average run ~2hr, ~1M tokens. https://t.co/8OkuSSMN73

Texas, USA เข้าร่วม Mayıs 2024
150 กำลังติดตาม76 ผู้ติดตาม
Dustin Ogle
Dustin Ogle@DustinOgle33·
@kr0der GPT 5.5 was named weird to begin with. It was an entirely new pre-train run, with new research methods, etc. So it is possible 5.6 cooks. We also don't know the sizes of these models. Fable could be bigger, but 5.6 is better tuned.
English
3
0
6
410
Dustin Ogle
Dustin Ogle@DustinOgle33·
@notjazii Surprised to see this behind Qwen and deepseek. Haven't used those a ton, but so far it feels better.
English
0
0
0
16
J A Z I I
J A Z I I@notjazii·
Chinese AI labs are shipping like crazy > Kimi K2.7 was released yesterday > GLM 5.2 was released a few hours ago and both are actually good open models are catching up way faster than most people expected already tested K2.7 waiting for GLM to be available in opencode so I can run the same tests drop a coding task you want me to test on both 👇
J A Z I I tweet media
English
47
6
219
61.3K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@andrewqu This shouldn't really be a hot take. If you mean just by sheer percentage of tasks. But once in a while, you need to architect an entire plan/strategy. And for those situations, you want the smartest model possible.
English
0
0
0
117
Andrew Qu
Andrew Qu@andrewqu·
Hot take: a lot of people wouldn’t be able to tell the difference if they were randomly routed between gpt-5.5, opus-4.8, or fable-5 for their day to day work
English
236
25
1.1K
64.3K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@seconds_0 It is a much smaller model I agree it is benchmaxxed but still packs a punch.
English
2
0
1
728
0.005 Seconds (3/694)
0.005 Seconds (3/694)@seconds_0·
Quick vibecheck on benches last night - Kimi K2.7 is _really good_ - Minimax M3 is expensive, poorly engineered benchmaxxed and bad
English
22
7
448
30.1K
Hassan
Hassan@buildwithhassan·
my personal chinese coding model tier list from daily usage: tier 1: GLM-5 ≈ GLM-5.1 ≈ Kimi K2.6 > Qwen 3.7 Max (somehow the most expensive and not the best) tier 2: Qwen 3.7 Plus (under 200K context) > DeepSeek V4 Pro ≈ V4 Flash ≈ MiMo 2.5 Pro not there yet: MiniMax, Hunyuan testing K2.7 Code soon and GLM-5.2. ranking might change by morning.
English
37
4
364
28.5K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@llmdevguy Yeah I mean not to fanboy too hard but with a legacy pro plan, it's been breezing through stuff. Love the effort modes and so far it is quite good at coding/long-running agentic workflows.
English
0
0
1
717
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
🔥GLM 5.2 vs Kimi K2.7. Which one is better? Will test it soon. What's your thoughts?
English
44
0
279
34.7K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@buildwithhassan GLM 5.2 will probably make a run for top usage along with Minimax M3. DS is great at research and cheap prices, but just not the smartest.
English
0
0
0
541
Hassan
Hassan@buildwithhassan·
opencode published their real model usage data. what developers actually run when they're paying for it: 1. deepseek v4 flash: 32T tokens 2. deepseek v4 pro: 19T tokens 3. kimi k2.6: 6.5T tokens deepseek is running more tokens than the next 16 models combined. it's actual usage from developers spending their own money. glm-5.1 grew 419% too. the models winning on price and reliability aren't always the ones winning on twitter.
Hassan tweet media
English
22
20
395
25.7K
Dustin Ogle
Dustin Ogle@DustinOgle33·
GLM 5.2 is gonna be one of the best daily models out there. Pair it with GPT/Opus as the orchestrator/planner and get incredible intelligence per dollar.
Z.ai@Zai_org

Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.

English
0
0
0
144
Bindu Reddy
Bindu Reddy@bindureddy·
LOL! Forget Fable 5 Just got our benchmark results Kimi 2.7 code is MIND BLOWING when it comes to agentic coding Will publish shortly!
English
84
48
1K
80.5K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@elliotarledge @0xSero Oh, lmao, in that case then hell yeah. I got spooked from everyone saying how restricted it was on anything even close to RSI.
English
0
0
0
7
0xSero
0xSero@0xSero·
Wow. #3 on kernelbenchhard
0xSero tweet media
Deutsch
11
7
180
25K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@0xSero They did talk about self-improving a bunch in the 2.7 release. Haven't heard about it much for the M3 launch.
English
0
0
1
164
Dustin Ogle
Dustin Ogle@DustinOgle33·
@TeksEdge I am gonna need some REAP, 2bit dynamic, ultra quants to run this on 128 gb lol
English
0
0
1
727
David Hendrickson
David Hendrickson@TeksEdge·
📊 With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428B MoE, ~23B active) 🔥 GGUF Size Estimates Q8_0 → ~430-450 GB Q6_K → ~340-360 GB Q5_K_M/XL → ~280-310 GB Q4_K_M/XL → ~220-250 GB (Best balance) Q3_K_XL → ~170-200 GB Q2_K → ~110-140 GB Last resort Very efficient due to extreme sparsity! Practical local runs will need high-VRAM setups (multiple 5090s or better).
David Hendrickson tweet media
ModelScope@ModelScope2022

MiniMax M3 is now open source! The model combines native multimodal understanding, ultra-long context, and Agent capabilities in one.🚀 New MSA architecture: up to 1M context at 1/20 the per-token compute of the previous gen. 9x faster prefilling, 15x faster decoding, on par with full attention on most tasks. Two versions 👇: MiniMax-M3 (full precision) and MiniMax-M3-MXFP8 (quantized, lower VRAM). 🤖 modelscope.ai/models/MiniMax… 🤖 modelscope.ai/models/MiniMax… 🧠 12hrs autonomous: reproduced an ICLR 2025 Outstanding Paper end to end, 18 commits + 23 experiment plots ⚡ 147 iterations, 9.4x CUDA speedup: FP8 matmul kernel on Hopper, peak utilization 7.6% → 71.3%, zero human intervention 🛠️ PostTrainBench: scored 37.1, ranking 3rd behind Opus 4.7 (42.4) and GPT-5.5 (39.3)

English
4
4
135
30.3K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@theo Basically, although Mythos has been out for two months now. He is already probably playing with Mythos 5.2
English
0
0
3
883
Theo - t3.gg
Theo - t3.gg@theo·
Do you think Karpathy joined Anthropic just so he could use Mythos for ML research without restrictions?
English
304
102
5.8K
444.7K
Dustin Ogle
Dustin Ogle@DustinOgle33·
Amjad knows what's up. Loops are pretty basic. Dynamic workflows and orchestration is where it's at. Most companies will lean towards this soon 👇 I may have been a bit conservative with how Fable is going to be used in business, but you get the point. It will be for high-level strategic decisions.
Dustin Ogle tweet media
Amjad Masad@amasad

I’ve been doing “loops” for a while now. I don’t do much traditional prompting. Most of my prompts are barely a sentence expressing an outcome. - my orchestrator prompts parallel agents - my computer use verifier gives it feedback - my security, production, and SEO agents generate prompts for fixes The industry is typically 3-6 months behind what we’re doing at Replit.

English
0
0
0
49
Dustin Ogle
Dustin Ogle@DustinOgle33·
@JoelDeTeves If I had to guess, the big models are so RL'd on coding and math that they act more like autistic engineers and less like creatives/human sounding.
English
0
0
5
340
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
I accidentally discovered that Gemma-4-26B-A4B is way better at writing human sounding content than every other model out there - including frontier models like GPT 5.5 and Sonnet 4.6. I'm not sure why this is - it's kind of crazy how slopified these big expensive models are and for some reason, Google's open source model sounds a lot more natural and follows writing instructions better. WTF?
English
39
34
797
54.2K
Dustin Ogle
Dustin Ogle@DustinOgle33·
@Elaina43114880 If glm can hold it down to 700B parameters then it will be the best for 512 parameters. But they might increase the size too.
English
1
0
2
1.3K
Elaina
Elaina@Elaina43114880·
We're about to face a new problem: On a single Mac Studio 512GB, after different levels of quantization, which one actually ends up being the best model — MiniMax M3, Kimi K2.7 Code, or GLM-5.2?
Elaina@Elaina43114880

Big day for Mac Studio 512GB players! 🥳 MiniMax M3 and Kimi K2.7 Code just dropped as two powerful new open weights models. Both will still need some quantization to run comfortably on a single machine, but believe the community is already working on it! 😻 (btw GLM-5.2 is on the way 😸)

English
14
4
190
29.7K
Dustin Ogle
Dustin Ogle@DustinOgle33·
Sadly, going to be hard to run this locally, but major props to Minimax! Pound for pound, one of the best models out there. This is going to be amazing for a workhorse model once a frontier model has scoped and spec'ed out the plan.
MiniMax (official)@MiniMax_AI

MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Mini… MiniMax Sparse Attention: huggingface.co/papers/2606.13…

English
0
0
0
7
Dustin Ogle
Dustin Ogle@DustinOgle33·
@outsource_ @TesanaAI Does this let you use AI NPC's? Where the AI can control the actions on the back end, and the UI shows the character doing it?
English
1
0
1
89
Eric ⚡️ Building...
GUYS this is not a drill @TesanaAI is AMAZING for game development. The task: Rebuild a prototype of HermesWorld The platform generated assets, loading screens, music, characters and landed this. One prompt! After a few revisions this will be incredible 🔥
English
10
2
47
3.4K