Mateusz Mirkowski

3.8K posts

Mateusz Mirkowski

@llmdevguy

Autonomous agents, agentic engineering Building & testing agentic systems Exploring local LLMs

Remote work evangelist شامل ہوئے Mart 2013

149 فالونگ1.8K فالوورز

پن کیا گیا ٹویٹ

Mateusz Mirkowski@llmdevguy·27 Nis

x.com/i/article/2048…

ZXX

171

73.5K

Mateusz Mirkowski@llmdevguy·1h

@redmonkeyAI I am testing it. It's tier below them unfortunately.

English

396

redmonkey@redmonkeyAI·2h

@llmdevguy Add minimax M3 to the test as well please

English

441

Mateusz Mirkowski@llmdevguy·18h

🔥GLM 5.2 vs Kimi K2.7. Which one is better? Will test it soon. What's your thoughts?

English

349

46.2K

Mateusz Mirkowski@llmdevguy·6h

@eSaadster It seems to be true.

English

1.8K

Saad@eSaadster·12h

@llmdevguy glm-5.2 > kimi-k2.7 > glm-5.1

Türkçe

3.2K

Mateusz Mirkowski@llmdevguy·6h

@arunoda True. I use 5.5 mostly. Kimi and glm are good enough in most cases but not in very complex tasks.

English

Arunoda Susiripala@arunoda·12h

@llmdevguy I tried some really hard tasks yesterday with both. Still they are not Opus or GPT 5.5 level. Working with complex tasks involving backend, client & UI stuff they are not there yet. But for surgical tasks, it’s good. Both are better than their previous versions.

English

3.8K

Mateusz Mirkowski@llmdevguy·6h

@DanielPetroAI Yes for code review kimi is amazing.

English

1.6K

Daniel Petro@DanielPetroAI·17h

@llmdevguy I've only tried kimi so far for code review and it worked pretty well!

English

2.9K

Mateusz Mirkowski@llmdevguy·1d

Bye bye Fable.

215

Mateusz Mirkowski@llmdevguy·1d

@xyster So check this out. AI generated. youtube.com/watch?v=3E9NNR…

YouTube

English

Steve💙🇨🇦@xyster·1d

@llmdevguy Never even heard of that genre. I'll need to check it out.

English

Steve💙🇨🇦@xyster·2d

I hate that I am starting to like AI generated music better than other modern music. There are some telltale signs music is AI generated, but it's subtle. My sister is big into making AI music also, so I've been forced to listen to quite a bunch this year. Make it on a 3090.

English

554

Mateusz Mirkowski@llmdevguy·1d

@stevibe This is magic.

English

stevibe@stevibe·3d

My first reaction: How is that possible? Running DiffusionGemma 26B A4B NVFP4 on my DGX Spark at 161.9 tok/s!

English

520

39.8K

Mateusz Mirkowski@llmdevguy·1d

K2.6 was the best Chinese model until today. Now K2.7 is even better. Amazing work.

Kimi.ai@Kimi_Moonshot

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English

939

Mateusz Mirkowski@llmdevguy·3d

@UnslothAI Looks really good.

English

477

Unsloth AI@UnslothAI·3d

Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM. It supports high-speed text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio. GGUF: huggingface.co/unsloth/diffus… Guide: unsloth.ai/docs/models/di…

Google Gemma@googlegemma

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

English

246

1.8K

320K

Mateusz Mirkowski@llmdevguy·4d

@GeorgeWall1878 Fable or 5.5?

English

Gwall1878@GeorgeWall1878·4d

@llmdevguy I'm finding Medium is really the sweet spot, anything higher and it takes about a week to do anything. i'm using high for planning and medium to implement

English

Mateusz Mirkowski@llmdevguy·4d

So Fable 5 is more expensive than 5.5 xhigh, but at the same level.

Terp@OnlyTerp

Tldr Fable 5 low is solving in the least steps and is a great cost per task

English

1.6K

Mateusz Mirkowski@llmdevguy·5d

😞Yesterday I realized that 5.3 codex is removed. This is very sad. 5.3 was great for fixing bugs or small features. And it was very cheap. I hope we will get something in return.

English

169

Mateusz Mirkowski@llmdevguy·4 Haz

@arun100s Pretty bad. :( I will post about it later.

English

Arun Sampath@arun100s·3 Haz

@llmdevguy Few weeks ago you compare Chinese models and k2.6 was winner . How’s M3 compares now ?

English

Mateusz Mirkowski@llmdevguy·1 Haz

Current M3 pricing.

English

228

Mateusz Mirkowski@llmdevguy·1 Haz

Second iteration much better, but still issues with animations.

Mateusz Mirkowski@llmdevguy

🚀First try MiniMax 3. Not bad. Animations are a bit clunky, but overall it proposed me interesting design. Although I prefer what Kimi K2.6 proposed.

English

231

Mateusz Mirkowski@llmdevguy·1 Haz

@iam_multiman 3.7 27b and 35b mostly.

English

Matthew Anorkplim Loh@iam_multiman·1 Haz

@llmdevguy Did you test qwen?

English

Mateusz Mirkowski@llmdevguy·7 May

🇨🇳After testing Chinese models over the last few weeks, my coding ranking currently looks like this: 1. Kimi K2.6 2. GLM-5.1 3. MiMo V2.5 Pro 4. MiniMax 2.7 5. DeepSeek V4 Pro 👉But each of them has its own superpowers. Frontend/Design: K2.6 Backend: K2.6 / GLM-5.1 Code review: MiMo All-rounder: M2.7 Reasoning: DeepSeek Now I'm waiting for MiniMax 3.0, which I hope will take the number 1 spot!

English

156

196

2.4K

170.6K

Mateusz Mirkowski@llmdevguy·1 Haz

🚀First try MiniMax 3. Not bad. Animations are a bit clunky, but overall it proposed me interesting design. Although I prefer what Kimi K2.6 proposed.