Post

so what is it?
MiniCPM-5 1B is a 1-billion parameter text model by @OpenBMB. part of the "Pocket Rocket" series.
it's built for one thing: running powerful AI locally on any device you own.
your laptop. your phone. even inside a browser tab.

English

the benchmarks are insane for 1B.
MiniCPM-5 1B vs the competition:
> 48.85 on MMLU-Pro (Qwen3.5: 42.74)
> 70.06 on MMLU-Redux (Qwen3.5: 61.50)
> 91.60 on MATH-500 (Qwen3.5: 30.40)
> 40.42 on AIME-2025 (Qwen3.5: 1.04)
> 79.53 on τ²-Bench (Qwen3.5: 19.60)
it destroys Qwen3.5-0.8B, Qwen3-0.6B, and LFM2.5-1.2B-Thinking across the board.
knowledge. math. code. tool calling. it leads everywhere.

Suomi

but here's the coolest part..
it comes with a desktop pet app. a small AI companion that lives on your screen like a pixel buddy.
I installed it on my Mac. loaded the model. and within minutes it was chatting with me right on my desktop.
no cloud. no API costs. no internet needed. just a local AI pet you actually own.

English

and it runs literally everywhere. here's the breakdown:
> FP16: ~2GB VRAM (GPU / MacBook / server, zero loss)
> INT8: ~1GB (laptop / edge box, near-lossless)
> INT4/Q4: ~0.5GB (phone / tablet / even a car system)
inference via llama.cpp, ollama, vLLM, Sglang, Hugging Face, and ArcLight.
ArcLight is their open-source CPU inference framework. you can run a full LLM inside a Chrome tab.
0.5GB. on a phone. let that sink in.
English

@socialwithaayan @socialwithaayan 0.5GB numbers look clean but sustained inference is where it gets ugly. KV cache on edge quants blows up fast with ctx length. Test under real prompts not cold load, and watch nvidia-smi through the whole session
English