Väterchen Frost

34.2K posts

Väterchen Frost

@VaeterchenFrost

Literatur | Sprachakrobatik | © Eigene Fotos (außer RT) | Gedanken, Gedichte, Geschichten, Gedöns™ | German/English ✌🏻🎅🏻✨

Weliki Ustjug Katılım Ağustos 2016

1.2K Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

Väterchen Frost@VaeterchenFrost·10 Kas

»a day in gray«

English

1.2K

Väterchen Frost@VaeterchenFrost·8m

@ivanfioravanti @InsiderPresider Thanks for the heads up 🙌

English

Ivan Fioravanti ᯅ@ivanfioravanti·23m

@InsiderPresider that battery will go to 0 after several hours and MacBook will power down even if plugged in to power plug.

English

Ivan Fioravanti ᯅ@ivanfioravanti·29m

I pushed another small optimization to ds4 PR to enable M5 Neural Accelerators and speed up prefill. Here benchmarks, these are all client side metrics, server side numbers are slightly lower. A /metrics endpoint would be great. Tomorrow I'll test this with pi mono for some real coding sessions on M5 Max, but on M3 Ultra too.

English

407

Väterchen Frost@VaeterchenFrost·2d

@Prince_Canuma @adrgrondin Thanks! 😘

English

Prince Canuma@Prince_Canuma·2d

@VaeterchenFrost @adrgrondin Thanks! Qwen MTP is coming :) You can use mlx-vlm with vision to and just use the language model part by passing text instead of text+image

English

103

Prince Canuma@Prince_Canuma·3d

mlx-vlm v0.5.0 is here 🚀 This is the largest release ever 🙌🏽 → Continuous batching server + KV cache quantization → MTP and DFlash speculative decoding (single, batch, server) → Distributed inference: Qwen3.5, Kimi K2.5 & K2.6 → Prompt caching w/ warm-disk persistence → Gemma 4 video (multi-video) + MTP drafter @googlegemma → New models: Youtu-VL, Nemotron 3 Nano Omni, SAM 3D Body → Server: json_schema response_format, thinking mode flag Huge thanks to all 21 contributors and in particular the 18 new contributors, welcome aboard 🚢 Get started today: > uv pip install -U mlx-vlm Leave us a star ⭐️ github.com/Blaizzy/mlx-vlm

English

472

41.9K

Väterchen Frost retweetledi

Daniel Franke@dfranke·4d

You buy a German anvil. It contains 83 moving parts and requires winding twice a day. It's forged from excellent steel, holds tolerances across all three striking faces to within three microns, includes a beautifully indexed horn-adjustment mechanism nobody asked for, and requires a proprietary 11-point spanner should you need to replace the rebound calibration bushing. It runs flawlessly for years, but one day it starts up in limp mode because the onboard anvil-management system detects that it's overdue for its 50,000-strike inspection. You search AliExpress for a Chinese anvil, and are presented with a multitude of offerings from such household-name brands as DUKXJYIBF, HDBTGMXI, AND UEJQIP. They're all priced to within a few pennies of each other, appear completely identical except for the nameplate, and obviously all came out of the same factory. You text your blacksmith friend to ask if they're legit. He tells you he got one like that from KIXJBU a few years ago, and that it's been great and a terrific deal. You thank him, but KIXJBU seems to have folded so you buy the one from UEJQIP. When it arrives, it feels suspiciously light. You scratch it and realize it's iron-plated aluminum. You buy an American anvil. It's five times the price of the competition, but it comes from a brand that your great-grandfather used to love. It comes boxed with a warranty registration postcard, twenty pages of safety instructions, assay certificate, and a regulatory slip which lists its FCC certification and ITAR registration. It looks just like your friend's KIXJBU. There's a "Made In China" sticker on the bottom. You buy a Russian anvil. It arrives coated in cosmoline, wrapped in newspaper from 1974, and weighing 40% more than advertised. The finish looks like it was machined with a shovel. The face is not flat, but somehow this does not matter. You drop it off a truck, accidentally leave it outside for six winters, and use it to straighten a bulldozer blade. It's fine. You buy a Swedish anvil. It comes flat-packed in a long cardboard box with cheerful Neo-Grotesk lettering and a line drawing of a smiling man assembling it with an Allen key. The instructions contain no words, only pictograms showing the anvil face, horn, waist, feet, and 112 identical-looking fasteners. Halfway through assembly, you discover that the pritchel hole was installed upside down, but only because you used peg B17 where you should have used peg B71. Once assembled, it is clean, stable, and works better than it has any right to. You immediately wonder whether you should have bought two. You buy a Japanese anvil. It arrives wrapped in rice paper inside a paulownia box, accompanied by a certificate bearing three generations of signatures and a photograph of the first production example being presented to the Emperor. The face has been hand-polished by a seventy-eight-year-old master whose family has made striking surfaces since the Muromachi period. You are given detailed instructions for oiling it with a cloth folded in a specific way. It is the most beautiful object you own. You never quite work up the nerve to strike it.

English

423

3.1K

27.3K

1.1M

Väterchen Frost retweetledi

Andrés J. Colmenares@wawawiwacomics·4d

Bro! 🥐😱

161

2.8K

19.2K

Väterchen Frost retweetledi

W S@WildSentences·29 Nis

ZXX

2.5K

39.7K

1.1M

Väterchen Frost@VaeterchenFrost·26 Nis

@ivanfioravanti Had a similar issue with Gemma 4… not a problem for @inferencerlabs though, which just worked 💪

English

272

Ivan Fioravanti ᯅ@ivanfioravanti·26 Nis

Recently I have no luck with LM Studio and MLX so I have to revert to mlx_lm or oMLX. Here Brooooooklyn/Qwen3.6-27B-UD-Q3_K_XL-mlx that is working perfectly on the other do, while it's failing with a "The model has crashed without additional information" on LM Studio 😢

English

6.6K

Väterchen Frost retweetledi

islieb Krakelkiste@isliebcomics·25 Nis

ZXX

148

1.1K

Väterchen Frost@VaeterchenFrost·24 Nis

think of me as a complex creature non-conforming in non-stereotypical terms unpredictable and very unpractical this is how we roll in this waking nightmare this is how we live and how we learn #poem #poetry

English

Väterchen Frost retweetledi

shaurya@shauseth·19 Nis

schrödinger’s strait

Français

1.5K

Väterchen Frost retweetledi

Lifesabeach@Lifesab5138·19 Nis

#photography #skyscapes #overexposed

QME

516

Väterchen Frost@VaeterchenFrost·19 Nis

@ivanfioravanti @N8Programs Thanks Ivan, and yes, benchmarking is fun, isn’t it? 🙌

English

230

Ivan Fioravanti ᯅ@ivanfioravanti·19 Nis

MLX: Preview of Qwen3.5-35B-A3B 4bit Royal Rumble 4bit quantization. JANGQ and RAM-25GB-MLX are still missing and second run of some quantizations in progress. Full article later. So far quality ranking: 🥇 nvfp4 🥈 4bit-gs32 🥉 4bit-DWQ While performance ranking: 🥇 mxfp4 🥈 4bit 🥉 UD-MLX-4bit Notes: - bf16 has lower perplexity, but overall performed worst in benchmarks 🤷🏻‍♂️ - 200 cases have been executed for each benchmarks - All tests performed with same sampling parameters - Benchmarking requires a LOT of time, but it's useful and fun!

English

103

7.6K

Väterchen Frost retweetledi

Federico Italiano@FedeItaliano76·16 Nis

The stunning futurism bordering on abstraction of the Belgian avant-garde painter Félix de Boeck (1898–1995)

English

613

3.9K

97.5K

Väterchen Frost retweetledi

The New Yorker@NewYorker·12 Nis

A cartoon by Harry Bliss, from 2015.

English

202

1.1K

57.5K

Väterchen Frost@VaeterchenFrost·12 Nis

@0xSero Any chance to get a Q4 of the 30% REAP? 😇

English

195

0xSero@0xSero·12 Nis

Strongest model on the Framework AI Ryzen 128GB Qwen3.5-122B-REAP-q6 - 305 tokens/s prefill - 29.2 tokens/s decode - basically can serve 2 users at full context I was also able to get it to make GGUFs very easily. huggingface.co/0xSero/Qwen3.5…

English

349

18.3K

Väterchen Frost@VaeterchenFrost·12 Nis

@mudler_it @huggingface Oh 😥 I have some spare space in my repo if that'd help… 🫡

English

Ettore Di Giacinto@mudler_it·8 Nis

just ended the @huggingface quotas for uploading APEX quants 😅 If you know someone that works at @huggingface and could put me in contact to help me there bumping the quotas would be reeeally appreciated! 🙏

English

1.2K

Väterchen Frost retweetledi

islieb Krakelkiste@isliebcomics·10 Nis

ZXX

106

950

Väterchen Frost@VaeterchenFrost·6 Nis

@atomtanstudio @no_stp_on_snek Doing something useful and effective with AI? Nah… 😅

English

Rich · Atom Tan Studio@atomtanstudio·6 Nis

@no_stp_on_snek @VaeterchenFrost You know, that isn't a bad idea. It works for nearly everything. I created a skill for Craft (similar to Notion) for OpenClaw just by pointing at their SDK and telling OpenClaw to write it. Completely seamless.

English

Tom Turney@no_stp_on_snek·6 Nis

ran the same benchmark with TurboQuant+ on MLX. 520 samples, same model (gemma 4 26b BF16), M5 Max 128GB. 99% answer agreement (vs 97%) 10-64% KV savings (vs 0-53%) 78% accuracy both decode speedup is 0.79-0.99x... that's my gap. different architecture: i dequant once after prefill then run native SDPA. no fused kernel yet. trading decode speed for higher agreement and more compression. full results and code at #mlx-framework-port-experimental" target="_blank" rel="nofollow noopener">github.com/TheTom/turboqu… (including code snippets on how to integrate) great work on mlx-vlm and the benchmark script. used it directly for these runs. @Prince_Canuma @ekryski @anemll @ivanfioravanti FYI

Prince Canuma@Prince_Canuma

TurboQuant: Open Evals on MLX 🔥 Yesterday I launched mlx-vlm v0.4.4 with major TurboQuant performance improvements. Today, the open benchmark results on MM-NIAH (val, 520 samples) using Gemma 4 26B IT by @GoogleDeepMind on M3 Ultra: → 0 quality loss — 78% accuracy for both BL and TBQ → 97% answer agreement across all context lengths → 30–53% KV cache savings (where TBQ is active) → 1.16x decode speedup at ~60K context Benchmark code 👇🏽

English

8.2K

Väterchen Frost@VaeterchenFrost·6 Nis

@atomtanstudio @no_stp_on_snek Command line anxiety is real 🙈

English

Rich · Atom Tan Studio@atomtanstudio·6 Nis

@no_stp_on_snek I know I can figure it out. I am just lazy. I blame LLM's. 😂

English

Väterchen Frost@VaeterchenFrost·5 Nis

@mudler_it @Alibaba_Qwen @Google APEX is great as is, already. Reduced memory footprint (=increased speed), while boosting reasoning quality and accuracy. Huge W, imho!

English

127

Ettore Di Giacinto@mudler_it·5 Nis

APEX quantization update - in 3 days 10 new MoE models published to HuggingFace! Here is a full list of the new APEX GGUF quants: - huggingface.co/mudler/Qwen3.5… (original, with full benchmarks @Alibaba_Qwen ) - huggingface.co/mudler/gemma-4… ( new MoE from @Google ) - huggingface.co/mudler/GLM-4.7… (30B, MLA attention) (@Zai_org ) - huggingface.co/mudler/Holo3-3… (VLM, with mmproj) - huggingface.co/mudler/Qwen3.5… - huggingface.co/mudler/gemma-4… (abliterated) - huggingface.co/mudler/gemma-4… - huggingface.co/mudler/Qwen3-C… (80B, 512 experts!) @Alibaba_Qwen - huggingface.co/mudler/Mistral… (MLA, with mmproj) - huggingface.co/mudler/LFM2-24… (hybrid conv/MoE by @liquidai ) - huggingface.co/mudler/MiniMax… (228B! @MiniMax_AI ) - huggingface.co/mudler/Qwen3.5… ( @Alibaba_Qwen ) - huggingface.co/mudler/Qwen3-C… ( @Alibaba_Qwen ) - huggingface.co/mudler/Nemotro… ( @nvidia ) Still in the pipeline: ⏳ Nemotron-3-Nano-30B + Super-120B (Mamba-2 hybrid) ⏳ Step-3.5-Flash (196B) ⏳ Qwen3.5-397B-A17B ⏳ Trinity-Large-Thinking (398B) 7 profiles each: Quality, Balanced, Compact + I-variants with diverse calibration. Only huggingface.co/mudler/Qwen3.5… has validated benchmarks so far. Full benchmark pass with lm-evaluation-harness coming next, and optimization phase (we will re-quantize few models).

Ettore Di Giacinto@mudler_it

APEX quantizations of more models ongoing! Meanwhile, playing with Qwen 3.5.. the impact of APEX vs Unsloth Dynamic quant on quality is clearly visible IMO, at least in some areas. I know we need more numbers before drawing conclusions, but this isn't about numbers. Just check out a simple prompt: "create an html page of a rotating cube in SVG." Left: Unsloth Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf (48.7 GB, ~32 tok/s) → flat square (?????) Right: APEX Qwen3.5-35B-A3B-APEX-I-Quality.gguf (22.8 GB, ~53 tok/s) → ✨

English

2.1K

Keşfet

@ivanfioravanti @InsiderPresider @Prince_Canuma @adrgrondin @googlegemma @inferencerlabs @N8Programs @0xSero