Steffen Röcker

3.4K posts

Steffen Röcker banner
Steffen Röcker

Steffen Röcker

@sroecker

OG local LLaMA shill. Sr. Solution Architect @RedHat, ex particle physicist. Born @ 347 ppm CO₂. Personal account, potentially unaligned.

Stuttgart, Germany Katılım Mart 2009
6.5K Takip Edilen1.8K Takipçiler
Steffen Röcker retweetledi
XReyRobert
XReyRobert@XReyRobert·
Running agents locally is gaining traction, but I kept wondering what my @ollama backend was actually doing underneath So today I built peekstack to surface #ollama activity in real time. Sharing it in case others using @openclaw, Hermes Agent from @NousResearch, or similar local agent setups have run into the same visibility gap.
GIF
English
2
2
3
84
Steffen Röcker retweetledi
Skyler Miao
Skyler Miao@SkylerMiao7·
M2.7 open weights coming in ~2 weeks. still actively iterating just updated a new version on yesterday — noticeably better on OpenClaw.
English
77
67
990
74.9K
Steffen Röcker retweetledi
Gökdeniz Gülmez
Gökdeniz Gülmez@ActuallyIsaak·
Well I did it. Just activated GitHub Sponsors. ❤️ If my work on ML tooling, Apple Silicon training, or open-source AI projects (like mlx-lm-lora, or the JOSIE models) has helped you, consider supporting the development. Sponsorship helps me continue building and maintaining open tools for the ML community. github.com/sponsors/Goekd…
English
4
4
15
3.6K
Steffen Röcker retweetledi
◢
@joemccann·
If you want to understand why open source agent harnesses are going to win, read this post. Sure, @AnthropicAI is going for the walled garden @Apple approach to force users to use “their” harness, but in order to win, they’ll need to entrench enterprises with Claude Code whilst making CC more “secure”, make switching costs too expensive, from career risk to engineering leadership to literal dollars.
Can Bölük@_can1357

x.com/i/article/2021…

English
18
22
339
85.2K
Steffen Röcker
Steffen Röcker@sroecker·
@banteg Impossible Cloud also has zero egress fees, at least in Europe
English
0
0
0
231
banteg
banteg@banteg·
so seems like the only serious contender is filecoin warm storage, which costs $2.5/tb/mo, 6x cheaper than cloudflare r2 at $15/tb/mo. but it loses on egress, filecoin costs $14/tb downloaded, while cloudflare is unbeatable at zero egress cost.
English
16
1
65
10.7K
Steffen Röcker retweetledi
Steffen Röcker retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
Big step for production LLM serving. KServe v0.17 brings LLMInferenceService to GA, built on @_llm_d_. KV-cache routing, disaggregated prefill-decode, cost-aware autoscaling. 38 contributors, 21 of them new. The community keeps delivering.
Yuan (Terry) Tang@TerryTangYuan

𝗞𝗦𝗲𝗿𝘃𝗲 𝘃𝟬.𝟭𝟳 𝗶𝘀 𝗹𝗶𝘃𝗲! 🚀 We are thrilled to announce KServe's most significant update yet. We’ve overhauled the architecture to move beyond traditional model serving. LLMInferenceService is now fully production-ready and built on the high-performance @_llm_d_ framework. 𝗪𝗵𝗮𝘁’𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱? - KV-cache aware routing and disaggregated prefill-decode to maximize throughput. - Cost-aware autoscaling designed for LLM inference workloads. - Comprehensive parallelism specification for distributed inference. - Envoy AI Gateway integration for sophisticated token-based rate limiting. - A completely restructured modular Helm chart architecture. 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗣𝗼𝘄𝗲𝗿 🤝 This version was made possible by 𝟯𝟴 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗼𝗿𝘀, including 𝟮𝟭 𝗻𝗲𝘄 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗼𝗿𝘀. Thank you for your hard work! Check out the full release notes here: github.com/kserve/kserve/… Release blog: kserve.github.io/website/blog/k…

English
0
7
13
1.6K
Steffen Röcker retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
LLM Compressor v0.10 is out 🚀 Compress 70B+ models without running out of memory. And do it 3.8x faster. What's new: - Custom disk offloading for models that don't fit in GPU memory - Distributed GPTQ: split compression across 4 GPUs (Qwen3-30B-A3B went from 3.9h → 1h) - GPTQ FP4 microscale (NVFP4 + MXFP4) support @MistralAI already used it to ship their NVFP4 checkpoint. developers.redhat.com/articles/2026/…
English
2
10
88
4.7K
Steffen Röcker
Steffen Röcker@sroecker·
@dhruvmt2 @francoisfleuret Came to the same conclusion, Minimax M2.5 is the most cost effective model when you go beyond consumer hw. We have it running on 4x A100 (FP8 with Marlin kernels) and you could even quantize it further down to W4 (to be tested). Even more so once the weights for M2.7 are out.
English
0
0
2
56
François Fleuret
François Fleuret@francoisfleuret·
1. What are the best open source coding / general purpose models? 2. What hardware to run them comfortably? 3. How do they compare to the flagships?
English
11
2
28
10K
Steffen Röcker retweetledi
Lev Reyzin
Lev Reyzin@lreyzin·
The 2024 Nobel Prize in physics went to a computer scientist. Now, in turn, the 2025 Turing Award was given to a physicist!
English
16
68
885
92K
Steffen Röcker retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
Congrats to @MistralAI on releasing Mistral Small 4 in NVFP4. This checkpoint was generated using LLM Compressor, an open source quantization toolkit that's part of @vllm_project. Red Hat AI worked directly with the Mistral team to produce it. Great to see upstream model providers choosing open source tooling for their official quantized releases. huggingface.co/mistralai/Mist…
Mistral AI for Developers@MistralDevs

🔥 Meet Mistral Small 4: One model to do it all. ⚡ 128 experts, 119B total parameters, 256k context window ⚡ Configurable Reasoning ⚡ Apache 2.0 ⚡ 40% faster, 3x more throughput Our first model to unify the capabilities of our flagship models into a single, versatile model.

English
8
7
32
3.2K
Steffen Röcker retweetledi
AVB
AVB@neural_avb·
God bless this guy. He is pretty much one of the main, if not THE main maintainer of most of the mlx python libraries. > mlx-audio > mlx-vlm > mlx-embeddings > mlx-video (wip) > more... All of these libraries ship regularly, and they all work. Absolute legend.🫡
AVB tweet media
Prince Canuma@Prince_Canuma

🚀 mlx-audio v0.4.1 is out! New models: → Granite Speech 4.0 (STT and AST) → Canary STT (NVIDIA canary-1b-v2) → Moonshine STT → MMS STT → FireRedASR2-AED STT → SenseVoice STT → Fish Audio S2 Pro TTS Plus: → Native MLX DeepFilterNet speech enhancement (v1/v2/v3) → OGG, Opus & Vorbis audio format support → LID fix for ECAPA/SpeechBrain alignment Thank you all contributions for this release: @lllucas, @beshkenadze, @andimarafioti, mm65x, irachex and Kylehowells! 🚀 > uv pip install -U mlx-audio Leave us a star ⭐️ github.com/Blaizzy/mlx-au…

English
1
9
148
11.6K
Steffen Röcker retweetledi
albs—
albs—@albfresco·
i love mactop and btop so i made clanktop
albs— tweet media
English
9
13
189
12.2K
Steffen Röcker retweetledi
vipli
vipli@viplismism·
just shipped rlm (recursive language model) cli it's based on recursive language models paper (arXiv:2512.24601), so the layman logic is instead of stuffing your entire context into one llm call and hoping it doesn't go into context rot, rlm writes code to actually process the data, slicing, chunking, running sub-queries on pieces and looping until it gets the answer. works with claude, gpt, gemini whatever you want, run it from any project directory and it auto-loads the file tree as context so it already knows your codebase before you even ask a question. setup takes like 30 seconds : just run npm i -g rlm-cli then rlm (first run asks for api key and you're good). it's open source, MIT licensed, if something breaks or you have ideas just open an issue. still converging and managing everything on my own for now! adding the link to the repo in the comments
vipli tweet media
English
16
20
323
33.1K