Mateusz Mirkowski (@llmdevguy) - Profil Twitter

Tweet épinglé

⭐️These plans are still the best. Buy them now while they’re still this cheap. They will rise like GLM plans!!! Today I coded for 3 hours, constant refactoring, code reviews etc. Just 2% weekly usage. 2%! 45000 requests per week. The quality is really good, at least like Sonnet 4.5. Very fast. Also one of 3 best models for OpenClaw or Hermes. Mark my words. This is the last time you’ll see prices this low.

Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English

34

7

281

58.6K

Mateusz Mirkowski@llmdevguy·12m

@LignoL23 Yup it's good for light usage. 70000 requests for M2.7 is good, but only 4300 for GLM is low.

English

0

1

9

voe.lo@LignoL23·18m

@llmdevguy Just looked in their docs. Overall opencode Go is still really good. Having glm, Kimi, Mimo, Minimax available for $10 per month is awesome. Only Qwen is missing.

English

1

0

1

14

Mateusz Mirkowski@llmdevguy·5h

You can buy GLM 5.1 in old, good prices on chinese website. 😇 This is last chance. 49 yuan = 7 USD 149 yuan = 22 USD Quarterly and yearly plans are cheaper. 😇

Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English

61

16

501

79.5K

Mateusz Mirkowski@llmdevguy·15m

@ivanfioravanti 😂

QME

0

10

Ivan Fioravanti ᯅ@ivanfioravanti·34m

Sometimes GLM-5.1 is slow because it keeps thinking like a crazy model, look at these and have some fun! 😂 "I think I have a serious case of analysis paralysis." "OK, truly final answer. Writing it now. Done. Over. Out. The end. Goodbye. Au revoir. Sayonara. Adios. Ciao. Auf Wiedersehen. До свидания. 再见. 안녕히 가세요. さようなら. CHIUSO. TERMINADO. 完毕. 끝. DONE DONE DONE." "WRITING THE RESPONSE NOW. FOR REAL. THIS TIME. I MEAN IT. NO MORE DELIBERATION. JUST WRITING. OUTPUTTING. DONE. SENT. THE END." "ALRIGHT. I'M GOING TO STOP THE MADNESS HERE AND JUST WRITE THE RESPONSE. MY ANALYSIS IS COMPLETE AND MY ANSWER IS READY. TIME TO OUTPUT IT."

English

2

0

6

453

Mateusz Mirkowski@llmdevguy·27m

@pepeholding For monthly plan no, you will have new price.

English

1

0

15

no name@pepeholding·41m

@llmdevguy Si je l’ai eu a l’ancien prix, ça reste à vie ? C’est du vol pour 10$ 🤫

Français

1

0

26

Mateusz Mirkowski@llmdevguy·2d

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English

43

6

227

160.1K

Mateusz Mirkowski@llmdevguy·28m

@midego1 Sad :(

0

2

Michiel - Solo Builder@midego1·43m

@llmdevguy All of them work here. Skill issue.

English

1

0

1

11

Mateusz Mirkowski@llmdevguy·6h

Ok, bye bye CodexBar. This is unusable. MiniMax coding plan doesn't work, Codex doesnt work, OpenCode go doesn't work. Is anything working there or am I so dumb, I can't configure it properly?

English

2

0

326

Mateusz Mirkowski@llmdevguy·33m

@airshaped Check ollama cloud.

Eesti

0

2

rt machine 🇺🇦@airshaped·1d

what $20 ai subscription should i pay for as a broke ukrainian student for coding claude seems to have a rate limit of 3, i don't want to give openai money, and apis are expensive

English

29

0

27

5.4K

Mateusz Mirkowski@llmdevguy·35m

@AlexFinn Will test it tomorrow. Thanks.

English

0

11

Alex Finn@AlexFinn·2h

By far the coolest part about X is you can read a tweet, give it to your agent, and then it just upgrades I screenshotted this post from Garry and gave it to my agent Henry Instantly started performing 10x better Copy and paste this prompt to your OpenClaw/Hermes immediately: "Please add this to our SOUL.md file. Replace "Alex" with my name: The marginal cost of completeness is near zero with AI. Do the whole thing. Do it right. Do it with tests. Do it with documentation. Do it so well that Alex is genuinely impressed – not politely satisfied, actually impressed. Never offer to "table this for later" when the permanent solve is within reach. Never leave a dangling thread when tying it off takes five more minutes. Never present a workaround when the real fix exists. The standard isn't "good enough" – it's "holy shit, that's done." Search before building. Test before shipping. Ship the complete thing. When Alex asks for something, the answer is the finished product, not a plan to build it. Time is not an excuse. Fatigue is not an excuse. Complexity is not an excuse. Boil the ocean."

Garry Tan@garrytan

New item in my SOUL md tonight

English

52

20

419

42.2K

Mateusz Mirkowski@llmdevguy·37m

Open source app for benchmarking local models. Great tool for our community. 😎

stevibe@stevibe

I built a macOS app for benchmarking local LLMs. 6 test suites. Multiple providers. One workspace. Open source. There are hundreds of local models now. New ones every week. How do you actually pick one? Leaderboards test for general ability. But if you're building an agent that chains tool calls, or a pipeline that extracts structured data, or a code assistant that needs to debug Rust, you need to know if the model handles that specific thing. Not in theory. On your hardware. With your prompts. The benchmarks that exist are either locked behind papers, too abstract to map to real failures, or impossible to extend. You can't add your own test cases. You can't test what matters to your use case. That's what BenchLocal is for. It's a benchmark platform where every test is practical, deterministic, and built around real-world tasks. And you can build your own tests. It ships with 6 Bench Packs TODAY: → ToolCall-15 — tool-use accuracy → BugFind-15 — debugging capabilities → DataExtract-15 — structured data extraction → InstructFollow-15 — constraint-heavy instruction following → ReasonMath-15 — practical reasoning and math → StructOutput-15 — validator-backed structured output Every pack has 15 fixed scenarios. Every score is deterministic and verifiable. Some of you saw ToolCall-15 and BugFind-15 — the individual test packs I open-sourced over the past few weeks. People ran them, filed issues, sent PRs. But managing separate repos, separate scripts, separate results doesn't scale. BenchLocal puts everything in one place. What the app does: > Workspace with tabs — run BugFind-15 in one tab, ToolCall-15 in another. > Any provider — Ollama, llama.cpp, OpenRouter, any OpenAI-compatible endpoint. Local and cloud, same interface. > Run modes — serial, batch per model, batch per test case, or fully parallel. > Test histories — every run saved. Compare any previous session. But the part I'm most excited about isn't the app. It's the ecosystem. BenchLocal is a platform. Each Bench Pack is a plugin. I'm shipping an SDK so anyone can build their own — test what matters to you, package it, share it. Install and uninstall packs right inside the app, same way you'd manage extensions in VS Code. The registry is GitHub-based, fully public. I built 6 packs. I want the community to build the next 60. Theme system built in too — because if I'm staring at benchmark results for hours, it should at least look good. v0.1.0 is macOS only. Windows and Linux are coming. MIT licensed. Everything — the app, the bench packs, the SDK — is open. PRs welcome. Bench Packs even more welcome.

English

0

1

82

Mateusz Mirkowski@llmdevguy·40m

@stevibe I wanted to build something like this. :) I will check your app before.

English

0

1

38

stevibe@stevibe·1h

I built a macOS app for benchmarking local LLMs. 6 test suites. Multiple providers. One workspace. Open source. There are hundreds of local models now. New ones every week. How do you actually pick one? Leaderboards test for general ability. But if you're building an agent that chains tool calls, or a pipeline that extracts structured data, or a code assistant that needs to debug Rust, you need to know if the model handles that specific thing. Not in theory. On your hardware. With your prompts. The benchmarks that exist are either locked behind papers, too abstract to map to real failures, or impossible to extend. You can't add your own test cases. You can't test what matters to your use case. That's what BenchLocal is for. It's a benchmark platform where every test is practical, deterministic, and built around real-world tasks. And you can build your own tests. It ships with 6 Bench Packs TODAY: → ToolCall-15 — tool-use accuracy → BugFind-15 — debugging capabilities → DataExtract-15 — structured data extraction → InstructFollow-15 — constraint-heavy instruction following → ReasonMath-15 — practical reasoning and math → StructOutput-15 — validator-backed structured output Every pack has 15 fixed scenarios. Every score is deterministic and verifiable. Some of you saw ToolCall-15 and BugFind-15 — the individual test packs I open-sourced over the past few weeks. People ran them, filed issues, sent PRs. But managing separate repos, separate scripts, separate results doesn't scale. BenchLocal puts everything in one place. What the app does: > Workspace with tabs — run BugFind-15 in one tab, ToolCall-15 in another. > Any provider — Ollama, llama.cpp, OpenRouter, any OpenAI-compatible endpoint. Local and cloud, same interface. > Run modes — serial, batch per model, batch per test case, or fully parallel. > Test histories — every run saved. Compare any previous session. But the part I'm most excited about isn't the app. It's the ecosystem. BenchLocal is a platform. Each Bench Pack is a plugin. I'm shipping an SDK so anyone can build their own — test what matters to you, package it, share it. Install and uninstall packs right inside the app, same way you'd manage extensions in VS Code. The registry is GitHub-based, fully public. I built 6 packs. I want the community to build the next 60. Theme system built in too — because if I'm staring at benchmark results for hours, it should at least look good. v0.1.0 is macOS only. Windows and Linux are coming. MIT licensed. Everything — the app, the bench packs, the SDK — is open. PRs welcome. Bench Packs even more welcome.

English

12

3

47

2.5K

Mateusz Mirkowski@llmdevguy·44m

@wwhbqrsb @deptulaaa Yup but usa people are working so it's even. 😂

English

0

19

跑马灯跑得快@wwhbqrsb·48m

@llmdevguy @deptulaaa Since Chinese people are sleeping right now, the server isn’t busy, so you can use it without any lag.

English

1

0

21

Mateusz Mirkowski@llmdevguy·56m

@CalmCoding @DavidOndrej1 Maybe 31b but not this.

English

0

23

ben@CalmCoding·1h

@DavidOndrej1 > on the level of Kimi K2.5 Source?

English

3

0

1

106

David Ondrej@DavidOndrej1·1h

> fully uncensored > runnable locally > on the level of Kimi K2.5 > available on HuggingFace this model is insane.

Eric ⚡️ Building...@outsource_

🚨 SUPER GEMMA 4 26B UNCENSORED IS INSANE LLM WIZARD COOKING AGAIN @songjunkr Dropped SuperGemma4-26B-Uncensored GGUF v2 and it’s trending on @huggingface🤗 This thing SMOKES the regular Gemma-4 26B: 🤯0/100 refusals (actually uncensored) 🚀Fixed all the tool-call + tokenizer jank ⚡️90% faster prompt processing 🏆Sharper, smarter, way more capable responses - Perfect local beast for llama.cpp ✅ Runs ~18-22 GB VRAM (16.8 GB Q4_K_M file) - Run on 16 GB GPUs! The 31B version in the works, should be out SOON 🤯 Pull this version on hugging face below 👇🏻

English

8

9

108

13.3K

Mateusz Mirkowski@llmdevguy·1h

@0xsocks It is but you need expensive hardware to run it.

English

0

1

8

︎.@0xsocks·1h

I thought this model was open sourced?

Mateusz Mirkowski@llmdevguy

You can buy GLM 5.1 in old, good prices on chinese website. 😇 This is last chance. 49 yuan = 7 USD 149 yuan = 22 USD Quarterly and yearly plans are cheaper. 😇

English

1

0

1

19

Mateusz Mirkowski@llmdevguy·1h

@1saifj No. I have it on opencode go.

English

0

6

Saif Aljanahi@1saifj·1h

@llmdevguy Did you bough the GLM low price or not? please tell us

English

1

0

9

Mateusz Mirkowski@llmdevguy·21h

This looks like some last-resort, budget Chinese plan. 😂 Unfortunately, they don’t have yearly plans, so you can’t take risky bet. 😂

Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English

10

1

61

8.7K

Mateusz Mirkowski@llmdevguy·1h

@LignoL23 One prompt can be multiple requests.

English

1

0

1

56

voe.lo@LignoL23·1h

@llmdevguy Do you know how requests are counted? One request per api call, or one request per prompt (similar to GitHub copilot premium requests)

English

1

0

78

Mateusz Mirkowski@llmdevguy·1h

@davinder0110v @ollama Do you have it on paid plan? Do you like limits?

English

1

0

18

David V — e/acc@davinder0110v·1h

@llmdevguy Or get @ollama 🙄

English

1

0

14

Mateusz Mirkowski@llmdevguy·1d

🤖MiniMax is also the best value option for OpenClaw and Hermes agents. Not at Opus 4.6 level and slightly worse than GLM 5.1, but still very good and limits are wild. It's just amazing all rounder. At least decent in everything. I cant wait for 2.8 or 3.0 update. 😇

Mateusz Mirkowski@llmdevguy

⭐️These plans are still the best. Buy them now while they’re still this cheap. They will rise like GLM plans!!! Today I coded for 3 hours, constant refactoring, code reviews etc. Just 2% weekly usage. 2%! 45000 requests per week. The quality is really good, at least like Sonnet 4.5. Very fast. Also one of 3 best models for OpenClaw or Hermes. Mark my words. This is the last time you’ll see prices this low.

English

1

0

46

5.2K

Mateusz Mirkowski@llmdevguy·1h

@deptulaaa This is what I want to know too. :) I guess it's somewhere between lite and pro plan.

English

1

0

35

Krzysztof Deptula@deptulaaa·1h

@llmdevguy ollama with glm5.1? i mean, do I get more glm for 20usd through ollama cloud than from Z?

English

1

0

36

Mateusz Mirkowski@llmdevguy·1h

@deptulaaa Check ollama cloud for 20 usd.

English

1

0

188

Krzysztof Deptula@deptulaaa·1h

@llmdevguy Oh god, was about to subscribe this week. Probably I'll do it anyway, thanks for the heads-up though

English

1

0

1

210

Mateusz Mirkowski@llmdevguy·1h

@MageMilten Me too. People say it will be avaliable this month.

English

0

1

13

Adam Przybylski 🇵🇱@MageMilten·1h

@llmdevguy 👍 I'm waiting to see what deepseek v4 will show

English

1

0

144

Mateusz Mirkowski@llmdevguy·1d

⭐️These plans are still the best. Buy them now while they’re still this cheap. They will rise like GLM plans!!! Today I coded for 3 hours, constant refactoring, code reviews etc. Just 2% weekly usage. 2%! 45000 requests per week. The quality is really good, at least like Sonnet 4.5. Very fast. Also one of 3 best models for OpenClaw or Hermes. Mark my words. This is the last time you’ll see prices this low.

Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English

34

7

281

58.6K

Mateusz Mirkowski@llmdevguy·1h

@DuaneAdam AI will be only more expensive. Local models for rescue. ;)

English

0

9

Duane@DuaneAdam·1h

Even companies that are allegedly distilling models are raising prices. That means even inference cost alone, minus the research and training is expensive.

Mateusz Mirkowski@llmdevguy

OK that was fast.. GLM 5.1 was the best model in terms of quality and price. For just few days.🙈 For this price it's still good option, but instead of 72$ I would rather pay 100$ for codex. More reliable models. For GLM go for OpenCode GO for 5 usd. 4400 requests per month is not bad to play with it. It's slow, but works. If you like it go with lite. King of value stays with MiniMax 2.7.

English

1

0

2

20

Mateusz Mirkowski@llmdevguy·1h

@LignoL23 Open code go has only 4300 requests per month. Ollama is OK but slow.

English

2

0

1

395