
Steffen Röcker
3.4K posts

Steffen Röcker
@sroecker
OG local LLaMA shill. Sr. Solution Architect @RedHat, ex particle physicist. Born @ 347 ppm CO₂. Personal account, potentially unaligned.



Vibe coded a simple TUI browser for ZIM files using @textualizeio, libzim and markdownify. @Kimi_Moonshot K2.5 did really well, only needed a bit of steering to use uv instead of pip directly.

🚨 BREAKING: Someone just open-sourced a full offline survival computer with AI, Wikipedia, and maps built in. Project N.O.M.A.D. is an open-source offline survival computer. Self-contained. Zero internet required after install. Zero telemetry. Everything runs locally on your hardware. What it includes: → Full Wikipedia archives via Kiwix → Offline maps via OpenStreetMap → Local AI models via Ollama + Open WebUI → Calculators, reference tools, resource libraries → A management UI to control everything from a browser One curl command installs the entire system on any Debian-based machine. Runs headless as a server so any device on your local network can access it. Minimum specs to run the base system: dual-core processor, 4GB RAM, 5GB storage. To run local LLMs offline, you want 32GB RAM and an NVIDIA RTX 3060 or better. No accounts. No authentication by default. No cloud dependency. No phone-home behavior. Built to function when nothing else does. The grid, the cloud, the API you depend on. None of it is guaranteed. The people building local-first systems right now are the ones who won’t be asking for help when access disappears.



Daniel Lemire, "How many branches can your CPU predict?," in Daniel Lemire's blog, March 18, 2026, lemire.me/blog/2026/03/1….


🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

𝗞𝗦𝗲𝗿𝘃𝗲 𝘃𝟬.𝟭𝟳 𝗶𝘀 𝗹𝗶𝘃𝗲! 🚀 We are thrilled to announce KServe's most significant update yet. We’ve overhauled the architecture to move beyond traditional model serving. LLMInferenceService is now fully production-ready and built on the high-performance @_llm_d_ framework. 𝗪𝗵𝗮𝘁’𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱? - KV-cache aware routing and disaggregated prefill-decode to maximize throughput. - Cost-aware autoscaling designed for LLM inference workloads. - Comprehensive parallelism specification for distributed inference. - Envoy AI Gateway integration for sophisticated token-based rate limiting. - A completely restructured modular Helm chart architecture. 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗣𝗼𝘄𝗲𝗿 🤝 This version was made possible by 𝟯𝟴 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗼𝗿𝘀, including 𝟮𝟭 𝗻𝗲𝘄 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗼𝗿𝘀. Thank you for your hard work! Check out the full release notes here: github.com/kserve/kserve/… Release blog: kserve.github.io/website/blog/k…


You can hide these !commands in html comments so people don't see them when reading the skill. The command executes without the AI even knowing about it.



Final update! MMLU is saturated and has become (rightfully) less popular. What should replace it? - Other knowledge evals like GPQA, MMLU-Pro - Code evals like LiveCodeBench - Agentic evals like BFCL - Other?

Due to popular demand, I've updated this figure to include DeepSeek-V2 and Mistral Large 2. It's also more zoomed for readability.

🔥 Meet Mistral Small 4: One model to do it all. ⚡ 128 experts, 119B total parameters, 256k context window ⚡ Configurable Reasoning ⚡ Apache 2.0 ⚡ 40% faster, 3x more throughput Our first model to unify the capabilities of our flagship models into a single, versatile model.


🚀 mlx-audio v0.4.1 is out! New models: → Granite Speech 4.0 (STT and AST) → Canary STT (NVIDIA canary-1b-v2) → Moonshine STT → MMS STT → FireRedASR2-AED STT → SenseVoice STT → Fish Audio S2 Pro TTS Plus: → Native MLX DeepFilterNet speech enhancement (v1/v2/v3) → OGG, Opus & Vorbis audio format support → LID fix for ECAPA/SpeechBrain alignment Thank you all contributions for this release: @lllucas, @beshkenadze, @andimarafioti, mm65x, irachex and Kylehowells! 🚀 > uv pip install -U mlx-audio Leave us a star ⭐️ github.com/Blaizzy/mlx-au…








