IF
7.7K posts







Introducing Kernels on the Hugging Face Hub ✨ What if shipping a GPU kernel was as easy as pushing a model? - Pre-compiled for your exact GPU, PyTorch & OS - Multiple kernel versions coexist in one process - torch.compile compatible - 1.7x–2.5x speedups over PyTorch baselines




Introducing MMX-CLI — our first piece of infrastructure built not for humans, but for Agents. Your Agent can read, think, and write. But ask it to sing, paint, or show you a world it's never seen — and it falls silent. Not because it doesn't understand, but because it has no mouth, no hands, no camera. Today, that changes. MMX-CLI gives every Agent seven new senses — image, video, voice, music, vision, search, conversation — powered by MiniMax's full-modal stack, today's SOTA across mainstream omni-modal models. One command: mmxAgent-native I/O. Zero MCP glue. Runs on your existing Token Plan. Two lines to give your Agent a voice: npx skills add MiniMax-AI/cli -y -g npm install -g mmx-cli Then tell it: "you have mmx commands available." It'll learn the rest. Github → github.com/MiniMax-AI/cli Token Plan: platform.minimax.io/subscribe/toke…

~ 6.5 - 6.7 t/s for GLM 5.1 on M5 Max 128GB Added “Dense” model export, now model load is only 5s ! Experts are streaming from SSD, so we do not pre-load it. Added direct SSD->Slot memory path, removed prefetch... Many dead end experiments. See Export a “dense-only GGUF” and “Fast path ” in tools/flashmob-sidecar/README.md WIP branch for Flash-MoE-SSD github.com/Anemll/anemll-…










