Väterchen Frost

34.2K posts

Väterchen Frost banner
Väterchen Frost

Väterchen Frost

@VaeterchenFrost

Literatur | Sprachakrobatik | © Eigene Fotos (außer RT) | Gedanken, Gedichte, Geschichten, Gedöns™ | German/English ✌🏻🎅🏻✨

Weliki Ustjug เข้าร่วม Ağustos 2016
1.2K กำลังติดตาม1.2K ผู้ติดตาม
ทวีตที่ปักหมุด
Väterchen Frost
Väterchen Frost@VaeterchenFrost·
»a day in gray«
Väterchen Frost tweet media
English
1
7
56
1.2K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@InsiderPresider that battery will go to 0 after several hours and MacBook will power down even if plugged in to power plug.
English
2
0
3
48
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
I pushed another small optimization to ds4 PR to enable M5 Neural Accelerators and speed up prefill. Here benchmarks, these are all client side metrics, server side numbers are slightly lower. A /metrics endpoint would be great. Tomorrow I'll test this with pi mono for some real coding sessions on M5 Max, but on M3 Ultra too.
Ivan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet media
English
4
4
25
1.7K
Prince Canuma
Prince Canuma@Prince_Canuma·
@VaeterchenFrost @adrgrondin Thanks! Qwen MTP is coming :) You can use mlx-vlm with vision to and just use the language model part by passing text instead of text+image
English
1
0
3
105
Prince Canuma
Prince Canuma@Prince_Canuma·
mlx-vlm v0.5.0 is here 🚀 This is the largest release ever 🙌🏽 → Continuous batching server + KV cache quantization → MTP and DFlash speculative decoding (single, batch, server) → Distributed inference: Qwen3.5, Kimi K2.5 & K2.6 → Prompt caching w/ warm-disk persistence → Gemma 4 video (multi-video) + MTP drafter @googlegemma → New models: Youtu-VL, Nemotron 3 Nano Omni, SAM 3D Body → Server: json_schema response_format, thinking mode flag Huge thanks to all 21 contributors and in particular the 18 new contributors, welcome aboard 🚢 Get started today: > uv pip install -U mlx-vlm Leave us a star ⭐️ github.com/Blaizzy/mlx-vlm
Prince Canuma tweet media
English
39
56
472
42K
Väterchen Frost รีทวีตแล้ว
Daniel Franke
Daniel Franke@dfranke·
You buy a German anvil. It contains 83 moving parts and requires winding twice a day. It's forged from excellent steel, holds tolerances across all three striking faces to within three microns, includes a beautifully indexed horn-adjustment mechanism nobody asked for, and requires a proprietary 11-point spanner should you need to replace the rebound calibration bushing. It runs flawlessly for years, but one day it starts up in limp mode because the onboard anvil-management system detects that it's overdue for its 50,000-strike inspection. You search AliExpress for a Chinese anvil, and are presented with a multitude of offerings from such household-name brands as DUKXJYIBF, HDBTGMXI, AND UEJQIP. They're all priced to within a few pennies of each other, appear completely identical except for the nameplate, and obviously all came out of the same factory. You text your blacksmith friend to ask if they're legit. He tells you he got one like that from KIXJBU a few years ago, and that it's been great and a terrific deal. You thank him, but KIXJBU seems to have folded so you buy the one from UEJQIP. When it arrives, it feels suspiciously light. You scratch it and realize it's iron-plated aluminum. You buy an American anvil. It's five times the price of the competition, but it comes from a brand that your great-grandfather used to love. It comes boxed with a warranty registration postcard, twenty pages of safety instructions, assay certificate, and a regulatory slip which lists its FCC certification and ITAR registration. It looks just like your friend's KIXJBU. There's a "Made In China" sticker on the bottom. You buy a Russian anvil. It arrives coated in cosmoline, wrapped in newspaper from 1974, and weighing 40% more than advertised. The finish looks like it was machined with a shovel. The face is not flat, but somehow this does not matter. You drop it off a truck, accidentally leave it outside for six winters, and use it to straighten a bulldozer blade. It's fine. You buy a Swedish anvil. It comes flat-packed in a long cardboard box with cheerful Neo-Grotesk lettering and a line drawing of a smiling man assembling it with an Allen key. The instructions contain no words, only pictograms showing the anvil face, horn, waist, feet, and 112 identical-looking fasteners. Halfway through assembly, you discover that the pritchel hole was installed upside down, but only because you used peg B17 where you should have used peg B71. Once assembled, it is clean, stable, and works better than it has any right to. You immediately wonder whether you should have bought two. You buy a Japanese anvil. It arrives wrapped in rice paper inside a paulownia box, accompanied by a certificate bearing three generations of signatures and a photograph of the first production example being presented to the Emperor. The face has been hand-polished by a seventy-eight-year-old master whose family has made striking surfaces since the Muromachi period. You are given detailed instructions for oiling it with a cloth folded in a specific way. It is the most beautiful object you own. You never quite work up the nerve to strike it.
English
423
3.1K
27.3K
1.1M
Väterchen Frost รีทวีตแล้ว
Andrés J. Colmenares
Andrés J. Colmenares@wawawiwacomics·
Bro! 🥐😱
Andrés J. Colmenares tweet media
5
161
2.8K
19.2K
Väterchen Frost รีทวีตแล้ว
W S
W S@WildSentences·
W S tweet media
ZXX
82
2.5K
39.7K
1.1M
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Recently I have no luck with LM Studio and MLX so I have to revert to mlx_lm or oMLX. Here Brooooooklyn/Qwen3.6-27B-UD-Q3_K_XL-mlx that is working perfectly on the other do, while it's failing with a "The model has crashed without additional information" on LM Studio 😢
Ivan Fioravanti ᯅ tweet media
English
11
1
63
6.6K
Väterchen Frost รีทวีตแล้ว
islieb Krakelkiste
islieb Krakelkiste@isliebcomics·
islieb Krakelkiste tweet media
ZXX
0
10
148
1.1K
Väterchen Frost
Väterchen Frost@VaeterchenFrost·
think of me as a complex creature non-conforming in non-stereotypical terms unpredictable and very unpractical this is how we roll in this waking nightmare this is how we live and how we learn #poem #poetry
Väterchen Frost tweet media
English
0
1
4
61
Väterchen Frost รีทวีตแล้ว
shaurya
shaurya@shauseth·
schrödinger’s strait
Français
4
2
45
1.5K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
MLX: Preview of Qwen3.5-35B-A3B 4bit Royal Rumble 4bit quantization. JANGQ and RAM-25GB-MLX are still missing and second run of some quantizations in progress. Full article later. So far quality ranking: 🥇 nvfp4 🥈 4bit-gs32 🥉 4bit-DWQ While performance ranking: 🥇 mxfp4 🥈 4bit 🥉 UD-MLX-4bit Notes: - bf16 has lower perplexity, but overall performed worst in benchmarks 🤷🏻‍♂️ - 200 cases have been executed for each benchmarks - All tests performed with same sampling parameters - Benchmarking requires a LOT of time, but it's useful and fun!
Ivan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet media
English
8
12
103
7.6K
Väterchen Frost รีทวีตแล้ว
Federico Italiano
Federico Italiano@FedeItaliano76·
The stunning futurism bordering on abstraction of the Belgian avant-garde painter Félix de Boeck (1898–1995)
Federico Italiano tweet mediaFederico Italiano tweet mediaFederico Italiano tweet mediaFederico Italiano tweet media
English
10
613
3.9K
97.6K
Väterchen Frost รีทวีตแล้ว
The New Yorker
The New Yorker@NewYorker·
A cartoon by Harry Bliss, from 2015.
The New Yorker tweet media
English
11
202
1.1K
57.5K
0xSero
0xSero@0xSero·
Strongest model on the Framework AI Ryzen 128GB Qwen3.5-122B-REAP-q6 - 305 tokens/s prefill - 29.2 tokens/s decode - basically can serve 2 users at full context I was also able to get it to make GGUFs very easily. huggingface.co/0xSero/Qwen3.5…
0xSero tweet media
English
23
19
349
18.3K
Ettore Di Giacinto
Ettore Di Giacinto@mudler_it·
just ended the @huggingface quotas for uploading APEX quants 😅 If you know someone that works at @huggingface and could put me in contact to help me there bumping the quotas would be reeeally appreciated! 🙏
English
4
1
24
1.2K
Väterchen Frost รีทวีตแล้ว
islieb Krakelkiste
islieb Krakelkiste@isliebcomics·
islieb Krakelkiste tweet media
ZXX
2
14
106
950
Rich · Atom Tan Studio
Rich · Atom Tan Studio@atomtanstudio·
@no_stp_on_snek @VaeterchenFrost You know, that isn't a bad idea. It works for nearly everything. I created a skill for Craft (similar to Notion) for OpenClaw just by pointing at their SDK and telling OpenClaw to write it. Completely seamless.
English
1
0
2
65
Tom Turney
Tom Turney@no_stp_on_snek·
ran the same benchmark with TurboQuant+ on MLX. 520 samples, same model (gemma 4 26b BF16), M5 Max 128GB. 99% answer agreement (vs 97%) 10-64% KV savings (vs 0-53%) 78% accuracy both decode speedup is 0.79-0.99x... that's my gap. different architecture: i dequant once after prefill then run native SDPA. no fused kernel yet. trading decode speed for higher agreement and more compression. full results and code at #mlx-framework-port-experimental" target="_blank" rel="nofollow noopener">github.com/TheTom/turboqu… (including code snippets on how to integrate) great work on mlx-vlm and the benchmark script. used it directly for these runs. @Prince_Canuma @ekryski @anemll @ivanfioravanti FYI
Tom Turney tweet media
Prince Canuma@Prince_Canuma

TurboQuant: Open Evals on MLX 🔥 Yesterday I launched mlx-vlm v0.4.4 with major TurboQuant performance improvements. Today, the open benchmark results on MM-NIAH (val, 520 samples) using Gemma 4 26B IT by @GoogleDeepMind on M3 Ultra: → 0 quality loss — 78% accuracy for both BL and TBQ → 97% answer agreement across all context lengths → 30–53% KV cache savings (where TBQ is active) → 1.16x decode speedup at ~60K context Benchmark code 👇🏽

English
2
3
32
8.2K
Ettore Di Giacinto
Ettore Di Giacinto@mudler_it·
APEX quantization update - in 3 days 10 new MoE models published to HuggingFace! Here is a full list of the new APEX GGUF quants: - huggingface.co/mudler/Qwen3.5… (original, with full benchmarks @Alibaba_Qwen ) - huggingface.co/mudler/gemma-4… ( new MoE from @Google ) - huggingface.co/mudler/GLM-4.7… (30B, MLA attention) (@Zai_org ) - huggingface.co/mudler/Holo3-3… (VLM, with mmproj) - huggingface.co/mudler/Qwen3.5… - huggingface.co/mudler/gemma-4… (abliterated) - huggingface.co/mudler/gemma-4… - huggingface.co/mudler/Qwen3-C… (80B, 512 experts!) @Alibaba_Qwen - huggingface.co/mudler/Mistral… (MLA, with mmproj) - huggingface.co/mudler/LFM2-24… (hybrid conv/MoE by @liquidai ) - huggingface.co/mudler/MiniMax… (228B! @MiniMax_AI ) - huggingface.co/mudler/Qwen3.5… ( @Alibaba_Qwen ) - huggingface.co/mudler/Qwen3-C… ( @Alibaba_Qwen ) - huggingface.co/mudler/Nemotro… ( @nvidia ) Still in the pipeline: ⏳ Nemotron-3-Nano-30B + Super-120B (Mamba-2 hybrid) ⏳ Step-3.5-Flash (196B) ⏳ Qwen3.5-397B-A17B ⏳ Trinity-Large-Thinking (398B) 7 profiles each: Quality, Balanced, Compact + I-variants with diverse calibration. Only huggingface.co/mudler/Qwen3.5… has validated benchmarks so far. Full benchmark pass with lm-evaluation-harness coming next, and optimization phase (we will re-quantize few models).
Ettore Di Giacinto@mudler_it

APEX quantizations of more models ongoing! Meanwhile, playing with Qwen 3.5.. the impact of APEX vs Unsloth Dynamic quant on quality is clearly visible IMO, at least in some areas. I know we need more numbers before drawing conclusions, but this isn't about numbers. Just check out a simple prompt: "create an html page of a rotating cube in SVG." Left: Unsloth Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf (48.7 GB, ~32 tok/s) → flat square (?????) Right: APEX Qwen3.5-35B-A3B-APEX-I-Quality.gguf (22.8 GB, ~53 tok/s) → ✨

English
2
4
29
2.1K