Mateusz Mirkowski

2.6K posts

Mateusz Mirkowski banner
Mateusz Mirkowski

Mateusz Mirkowski

@llmdevguy

Autonomous agents, agentic engineering Building & testing agentic systems Exploring local LLMs

Remote work evangelist Inscrit le Mart 2013
131 Abonnements491 Abonnés
Tweet épinglé
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
⭐️These plans are still the best. Buy them now while they’re still this cheap. They will rise like GLM plans!!! Today I coded for 3 hours, constant refactoring, code reviews etc. Just 2% weekly usage. 2%! 45000 requests per week. The quality is really good, at least like Sonnet 4.5. Very fast. Also one of 3 best models for OpenClaw or Hermes. Mark my words. This is the last time you’ll see prices this low.
Mateusz Mirkowski tweet media
Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English
34
7
281
58.6K
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
@LignoL23 Yup it's good for light usage. 70000 requests for M2.7 is good, but only 4300 for GLM is low.
English
0
0
1
9
voe.lo
voe.lo@LignoL23·
@llmdevguy Just looked in their docs. Overall opencode Go is still really good. Having glm, Kimi, Mimo, Minimax available for $10 per month is awesome. Only Qwen is missing.
English
1
0
1
14
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Sometimes GLM-5.1 is slow because it keeps thinking like a crazy model, look at these and have some fun! 😂 "I think I have a serious case of analysis paralysis." "OK, truly final answer. Writing it now. Done. Over. Out. The end. Goodbye. Au revoir. Sayonara. Adios. Ciao. Auf Wiedersehen. До свидания. 再见. 안녕히 가세요. さようなら. CHIUSO. TERMINADO. 完毕. 끝. DONE DONE DONE." "WRITING THE RESPONSE NOW. FOR REAL. THIS TIME. I MEAN IT. NO MORE DELIBERATION. JUST WRITING. OUTPUTTING. DONE. SENT. THE END." "ALRIGHT. I'M GOING TO STOP THE MADNESS HERE AND JUST WRITE THE RESPONSE. MY ANALYSIS IS COMPLETE AND MY ANSWER IS READY. TIME TO OUTPUT IT."
Ivan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet media
English
2
0
6
453
no name
no name@pepeholding·
@llmdevguy Si je l’ai eu a l’ancien prix, ça reste à vie ? C’est du vol pour 10$ 🤫
Français
1
0
0
26
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.
Mateusz Mirkowski tweet media
English
43
6
227
160.1K
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
Ok, bye bye CodexBar. This is unusable. MiniMax coding plan doesn't work, Codex doesnt work, OpenCode go doesn't work. Is anything working there or am I so dumb, I can't configure it properly?
Mateusz Mirkowski tweet media
English
2
0
0
326
rt machine 🇺🇦
rt machine 🇺🇦@airshaped·
what $20 ai subscription should i pay for as a broke ukrainian student for coding claude seems to have a rate limit of 3, i don't want to give openai money, and apis are expensive
English
29
0
27
5.4K
Alex Finn
Alex Finn@AlexFinn·
By far the coolest part about X is you can read a tweet, give it to your agent, and then it just upgrades I screenshotted this post from Garry and gave it to my agent Henry Instantly started performing 10x better Copy and paste this prompt to your OpenClaw/Hermes immediately: "Please add this to our SOUL.md file. Replace "Alex" with my name: The marginal cost of completeness is near zero with AI. Do the whole thing. Do it right. Do it with tests. Do it with documentation. Do it so well that Alex is genuinely impressed – not politely satisfied, actually impressed. Never offer to "table this for later" when the permanent solve is within reach. Never leave a dangling thread when tying it off takes five more minutes. Never present a workaround when the real fix exists. The standard isn't "good enough" – it's "holy shit, that's done." Search before building. Test before shipping. Ship the complete thing. When Alex asks for something, the answer is the finished product, not a plan to build it. Time is not an excuse. Fatigue is not an excuse. Complexity is not an excuse. Boil the ocean."
Garry Tan@garrytan

New item in my SOUL md tonight

English
52
20
419
42.2K
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
Open source app for benchmarking local models. Great tool for our community. 😎
stevibe@stevibe

I built a macOS app for benchmarking local LLMs. 6 test suites. Multiple providers. One workspace. Open source. There are hundreds of local models now. New ones every week. How do you actually pick one? Leaderboards test for general ability. But if you're building an agent that chains tool calls, or a pipeline that extracts structured data, or a code assistant that needs to debug Rust, you need to know if the model handles that specific thing. Not in theory. On your hardware. With your prompts. The benchmarks that exist are either locked behind papers, too abstract to map to real failures, or impossible to extend. You can't add your own test cases. You can't test what matters to your use case. That's what BenchLocal is for. It's a benchmark platform where every test is practical, deterministic, and built around real-world tasks. And you can build your own tests. It ships with 6 Bench Packs TODAY: → ToolCall-15 — tool-use accuracy → BugFind-15 — debugging capabilities → DataExtract-15 — structured data extraction → InstructFollow-15 — constraint-heavy instruction following → ReasonMath-15 — practical reasoning and math → StructOutput-15 — validator-backed structured output Every pack has 15 fixed scenarios. Every score is deterministic and verifiable. Some of you saw ToolCall-15 and BugFind-15 — the individual test packs I open-sourced over the past few weeks. People ran them, filed issues, sent PRs. But managing separate repos, separate scripts, separate results doesn't scale. BenchLocal puts everything in one place. What the app does: > Workspace with tabs — run BugFind-15 in one tab, ToolCall-15 in another. > Any provider — Ollama, llama.cpp, OpenRouter, any OpenAI-compatible endpoint. Local and cloud, same interface. > Run modes — serial, batch per model, batch per test case, or fully parallel. > Test histories — every run saved. Compare any previous session. But the part I'm most excited about isn't the app. It's the ecosystem. BenchLocal is a platform. Each Bench Pack is a plugin. I'm shipping an SDK so anyone can build their own — test what matters to you, package it, share it. Install and uninstall packs right inside the app, same way you'd manage extensions in VS Code. The registry is GitHub-based, fully public. I built 6 packs. I want the community to build the next 60. Theme system built in too — because if I'm staring at benchmark results for hours, it should at least look good. v0.1.0 is macOS only. Windows and Linux are coming. MIT licensed. Everything — the app, the bench packs, the SDK — is open. PRs welcome. Bench Packs even more welcome.

English
0
0
1
82
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
@stevibe I wanted to build something like this. :) I will check your app before.
English
0
0
1
38
stevibe
stevibe@stevibe·
I built a macOS app for benchmarking local LLMs. 6 test suites. Multiple providers. One workspace. Open source. There are hundreds of local models now. New ones every week. How do you actually pick one? Leaderboards test for general ability. But if you're building an agent that chains tool calls, or a pipeline that extracts structured data, or a code assistant that needs to debug Rust, you need to know if the model handles that specific thing. Not in theory. On your hardware. With your prompts. The benchmarks that exist are either locked behind papers, too abstract to map to real failures, or impossible to extend. You can't add your own test cases. You can't test what matters to your use case. That's what BenchLocal is for. It's a benchmark platform where every test is practical, deterministic, and built around real-world tasks. And you can build your own tests. It ships with 6 Bench Packs TODAY: → ToolCall-15 — tool-use accuracy → BugFind-15 — debugging capabilities → DataExtract-15 — structured data extraction → InstructFollow-15 — constraint-heavy instruction following → ReasonMath-15 — practical reasoning and math → StructOutput-15 — validator-backed structured output Every pack has 15 fixed scenarios. Every score is deterministic and verifiable. Some of you saw ToolCall-15 and BugFind-15 — the individual test packs I open-sourced over the past few weeks. People ran them, filed issues, sent PRs. But managing separate repos, separate scripts, separate results doesn't scale. BenchLocal puts everything in one place. What the app does: > Workspace with tabs — run BugFind-15 in one tab, ToolCall-15 in another. > Any provider — Ollama, llama.cpp, OpenRouter, any OpenAI-compatible endpoint. Local and cloud, same interface. > Run modes — serial, batch per model, batch per test case, or fully parallel. > Test histories — every run saved. Compare any previous session. But the part I'm most excited about isn't the app. It's the ecosystem. BenchLocal is a platform. Each Bench Pack is a plugin. I'm shipping an SDK so anyone can build their own — test what matters to you, package it, share it. Install and uninstall packs right inside the app, same way you'd manage extensions in VS Code. The registry is GitHub-based, fully public. I built 6 packs. I want the community to build the next 60. Theme system built in too — because if I'm staring at benchmark results for hours, it should at least look good. v0.1.0 is macOS only. Windows and Linux are coming. MIT licensed. Everything — the app, the bench packs, the SDK — is open. PRs welcome. Bench Packs even more welcome.
English
12
3
47
2.5K
ben
ben@CalmCoding·
@DavidOndrej1 > on the level of Kimi K2.5 Source?
English
3
0
1
106
David Ondrej
David Ondrej@DavidOndrej1·
> fully uncensored > runnable locally > on the level of Kimi K2.5 > available on HuggingFace this model is insane.
Eric ⚡️ Building...@outsource_

🚨 SUPER GEMMA 4 26B UNCENSORED IS INSANE LLM WIZARD COOKING AGAIN @songjunkr Dropped SuperGemma4-26B-Uncensored GGUF v2 and it’s trending on @huggingface🤗 This thing SMOKES the regular Gemma-4 26B: 🤯0/100 refusals (actually uncensored) 🚀Fixed all the tool-call + tokenizer jank ⚡️90% faster prompt processing 🏆Sharper, smarter, way more capable responses - Perfect local beast for llama.cpp ✅ Runs ~18-22 GB VRAM (16.8 GB Q4_K_M file) - Run on 16 GB GPUs! The 31B version in the works, should be out SOON 🤯 Pull this version on hugging face below 👇🏻

English
8
9
108
13.3K
voe.lo
voe.lo@LignoL23·
@llmdevguy Do you know how requests are counted? One request per api call, or one request per prompt (similar to GitHub copilot premium requests)
English
1
0
0
78
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
🤖MiniMax is also the best value option for OpenClaw and Hermes agents. Not at Opus 4.6 level and slightly worse than GLM 5.1, but still very good and limits are wild. It's just amazing all rounder. At least decent in everything. I cant wait for 2.8 or 3.0 update. 😇
Mateusz Mirkowski@llmdevguy

⭐️These plans are still the best. Buy them now while they’re still this cheap. They will rise like GLM plans!!! Today I coded for 3 hours, constant refactoring, code reviews etc. Just 2% weekly usage. 2%! 45000 requests per week. The quality is really good, at least like Sonnet 4.5. Very fast. Also one of 3 best models for OpenClaw or Hermes. Mark my words. This is the last time you’ll see prices this low.

English
1
0
46
5.2K
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
@deptulaaa This is what I want to know too. :) I guess it's somewhere between lite and pro plan.
English
1
0
0
35
Krzysztof Deptula
Krzysztof Deptula@deptulaaa·
@llmdevguy ollama with glm5.1? i mean, do I get more glm for 20usd through ollama cloud than from Z?
English
1
0
0
36
Krzysztof Deptula
Krzysztof Deptula@deptulaaa·
@llmdevguy Oh god, was about to subscribe this week. Probably I'll do it anyway, thanks for the heads-up though
English
1
0
1
210
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
⭐️These plans are still the best. Buy them now while they’re still this cheap. They will rise like GLM plans!!! Today I coded for 3 hours, constant refactoring, code reviews etc. Just 2% weekly usage. 2%! 45000 requests per week. The quality is really good, at least like Sonnet 4.5. Very fast. Also one of 3 best models for OpenClaw or Hermes. Mark my words. This is the last time you’ll see prices this low.
Mateusz Mirkowski tweet media
Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English
34
7
281
58.6K
voe.lo
voe.lo@LignoL23·
@llmdevguy Why not just choose opencode go or ollama cloud?
English
1
0
1
423
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
@MageMilten Maybe, but MiniMax has often updates, check how much 2.7 improved compared to 2.5. Soon we will have 2.8 or 3.0.
English
1
0
0
14
Adam Przybylski 🇵🇱
@llmdevguy poorly. Buying AI a year in advance is also a big risk. Tomorrow something different might come out and be much better.
English
1
0
0
12