Mateusz Mirkowski
3.8K posts

Mateusz Mirkowski
@llmdevguy
Autonomous agents, agentic engineering Building & testing agentic systems Exploring local LLMs

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days








🚀 Better inference efficiency, lower costs, broader access. MiMo-V2.5 Series API pricing is now permanently reduced — by up to 99% compared to previous pricing. ✨ Unified pricing across all context lengths. MiMo Token Plans have also been upgraded: • 5–8× more usable tokens at the same price • Simpler and more transparent billing rules 🎁 As a thank-you to current users, all current Token Plan credits will be fully reset. 🎧 MiMo-V2.5-TTS remains free for a limited time. ⏰ Effective May 26 at 6:00 PM PDT. These improvements are powered by continued inference optimization and serving efficiency upgrades across the MiMo stack. 🛠️ We’ll also publish a detailed technical blog on the inference optimizations later — stay tuned.

This is official account of $SUPERGEMMA We build ecosystem for open-source developers. Dev: @jun_song Open source will win.

We are making our discount permanent! 🎉 Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀

running Hermes locally with Qwen 3.6-35b-a3b is possible on a RTX 4060 Ti 8GB. my params are: ~~~ llama-server \ -m ~/llama-models/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \ -ngl 999 -ncmoe 30 -fa on \ --cache-type-k q8_0 --cache-type-v q8_0 \ -c 32768 -n 8192 -np 1 -t 6 \ --reasoning off \ --no-cache-prompt --checkpoint-every-n-tokens -1 \ --jinja --metrics --host 0.0.0.0 --port 8080 ~~~ biggest flaws: - context: if you are coding, a few prompts will eat it all - speed: it took 17min to create a medium-difficult .py file but it works! I'm going to test /goal feature as well, to see how Qwen handle multiple compactions and see if it can finish a goal.



