Rompel
1K posts

Rompel
@ukrroot
Running local AI on consumer hardware. RTX 5090 + Mac Studio M4. Benchmarks, costs, what actually works — receipts only.




Prompt Share 110- 夜桜 Image : Nanobanana 2 made in @Hailuo_AI Prompt night sakura trees lined along a narrow canal, glowing pink cherry blossoms at night, water mirror reflection perfectly reflecting the trees, stone riverbank and green grass slope, perspective leading into the distance, low angle shot near the water surface, deep indigo night sky, neon pink illumination, dreamy spring atmosphere, cinematic composition, ultra detailed, long exposure photography, highly saturated colors, Japan spring scenery

Wow. @Zai_org GLM 5.2 is a marvel! It is *at least* as good as Opus 4.8 and GPT 5.5. It's super fast, inexpensive, and not too verbose. It responds with nuance and judgement, & handles long context VERY well. I've never experienced an open weights model like this before.






















































oMLX 0.4.4rc2 is out with early MiniMax M3 support, made possible by the awesome mlx-vlm work from @Prince_Canuma and @ivanfioravanti, tracking the upstream mlx-vlm PR. Stable 0.4.4 is planned after a short final RC test pass! MiniMax M3 is supported with oMLX features including SSD cache, prefix cache, continuous batching, and the OpenAI-compatible API. This release also adds stronger macOS 27 compatibility, safer native MTP batching, more robust Gemma 4 / Harmony tool-call handling, and additional cache / Memory Guard hardening. MiniMax M3 single-request results (M3U 512G, ssd-cache on) pp1024 332.6 tok/s, tg128 28.9 tok/s pp4096 359.8 tok/s, tg128 20.8 tok/s pp8192 340.2 tok/s, tg128 20.2 tok/s pp32768 243.5 tok/s, tg128 18.9 tok/s Continuous batching at pp1024/tg128: 1x 28.9 tok/s 2x 40.1 tok/s 4x 51.3 tok/s 8x 57.8 tok/s github.com/jundot/omlx/re…





