Tarjei Mandt

2.2K posts

Tarjei Mandt banner
Tarjei Mandt

Tarjei Mandt

@kernelpool

Sydney, Australia Se unió Ağustos 2009
608 Siguiendo17.4K Seguidores
Tarjei Mandt retuiteado
N8 Programs
N8 Programs@N8Programs·
Recently, @awnihannun asserted that 'According to benchmarks Qwen3.5 4B is as good as GPT 4o.' This drew controversy: Is the 4B just benchmaxxed? How could a 4B be as good as GPT-4o? I tried to test this scientifically. The answer to the question is likely: yes, in most cases.
N8 Programs tweet media
English
47
113
1.1K
349.5K
Tarjei Mandt
Tarjei Mandt@kernelpool·
@awnihannun Thanks for all the great work on MLX! Good luck on what’s next!
English
1
0
5
2.6K
Awni Hannun
Awni Hannun@awnihannun·
Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon. Apple makes the best consumer hardware on the planet. There's so much potential for it to be the leading platform for AI. And I'm confident MLX will continue to have a big role in that. To the future: MLX remains in the exceptionally capable hands of our team including @angeloskath, @zcbenz, @DiganiJagrit, @NasFilippova, @trebolloc (and others not on X). Follow them or @shshnkp for future updates.
Awni Hannun tweet media
English
260
94
2.2K
396.1K
Tarjei Mandt retuiteado
Ivan Krstić
Ivan Krstić@radian·
🔺NEW: iPhone and iPad are now the first and only generally-available devices to meet the exacting security requirements for handling classified NATO information. apple.com/newsroom/2026/…
English
17
96
342
45.2K
Tarjei Mandt retuiteado
l33tdawg
l33tdawg@l33tdawg·
Over the CNY holidays, I decided to build something that imho is 'peak agentic AI' 🤣 - the world's first self-evolving CTF platform! AI agents design, validate, calibrate, and evolve security challenges autonomously. levelupctf.com Here's the full story 🧵
English
9
49
88
19.2K
Tarjei Mandt
Tarjei Mandt@kernelpool·
@blacktop__ Imagine being the AI and seeing the GPU getting restricted
English
2
0
9
1.4K
Blacktop
Blacktop@blacktop__·
Gave Claude my `ipsw` tool and my `ida-mcp-rs` and asked it what Apple adding `com.apple.developer.gpu-restricted` to `com.apple.WebKit.WebContent.EnhancedSecurity` does and here's the report it made. We are so cooked chat 🪦 gist.github.com/blacktop/f2606…
English
2
9
73
7.3K
Simon
Simon@AI_Homelab·
@ivanfioravanti @kernelpool Nice to see it checked through perplexity. Maybe we'll see more usage of DWQ quantization in MLX in the future. I think this is the first time I actually see a "proof" for it here on X. 👌
English
1
0
1
69
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
MLX DWQ quantization works! Here Perplexity for JoyAI-LLM_Flash! Uploaded on mlx-community by @kernelpool
Ivan Fioravanti ᯅ tweet media
English
3
4
38
3K
Tarjei Mandt
Tarjei Mandt@kernelpool·
@ivanfioravanti The sparse attention is slowing down the prefill, however, it can be fixed
English
1
0
5
217
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
This is what I mean. Benchmarking 64k context on M3 Ultra: Prompt: 63976 tokens, 45.1 tokens-per-sec Generation: 200 tokens, 12.1 tokens-per-sec Peak memory: 471.61 GB Total wall time: 1492s 👀
English
5
0
5
1.7K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
GLM-5 can't be run locally on Apple Silicon. Even at 4bit quantization it's too slow. We need more GPU power and memory bandwidth for model of this size.
English
23
1
123
13.2K
Sam Collinson
Sam Collinson@_rezin_·
@kernelpool unless it’s a kiwi model and it’s heaps of spraying as an itick victer
English
1
0
2
103
Tarjei Mandt
Tarjei Mandt@kernelpool·
That feeling when your SOTA model suggests heap spraying as an attack vector in 2026
English
1
0
5
1.6K
Tarjei Mandt
Tarjei Mandt@kernelpool·
@awnihannun @digitalix Generation aside, a sparse attention kernel would also help :) Currently prompt processing slows down quite a bit for longer context
English
0
0
1
48
Awni Hannun
Awni Hannun@awnihannun·
@digitalix Thanks for the results, clearly we have some work to do! Also you can use `mlx_lm.benchmark` to test tensor parallel scaling while ensuring it generates the same number of tokens for each setup. It will be slightly more accurate.
English
2
0
17
1.3K
Alex Ziskind
Alex Ziskind@digitalix·
GLM5 4bit scaling on a cluster of M3 Ultra Mac Studios - using MLX.
Alex Ziskind tweet media
English
9
6
135
11.1K
Tarjei Mandt retuiteado
Awni Hannun
Awni Hannun@awnihannun·
GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It's quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to @ActuallyIsaak and @kernelpool for the port.
Z.ai@Zai_org

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens. Try it now: chat.z.ai Weights: huggingface.co/zai-org/GLM-5 Tech Blog: z.ai/blog/glm-5 OpenRouter (Previously Pony Alpha): openrouter.ai/z-ai/glm-5 Rolling out from Coding Plan Max users: z.ai/subscribe

English
25
54
477
59.6K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
What is a good notebook for Linux? Tired of waiting for M5 Max 🤷🏻‍♂️
English
22
0
27
5.5K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@RickRossTN I've used a local conversion. @kernelpool created the mlx-communicty version. I think it stopped uploading due to a bug that is fixed in a PR.
English
1
0
2
138
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Step-3.5-Flash in action on MLX with OpenCode on a single (distributed testing in progress!) M3 Ultra to create a snake game! 🔥 6bit quantization. Perfect tool calling. Fast & powerful coding model! Recommended Inference Settings: Temperature: 1.0 Top-p: 0.95 Top-k: 40 🧵
English
15
6
200
17.2K
Tarjei Mandt
Tarjei Mandt@kernelpool·
@ivanfioravanti Did you try a higher quant than 4bit (e.g. 6bit)? I've sometimes seen quantization affect the </think> token probability negatively.
English
0
0
4
688
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
One thing about Step-3.5-Flash is certain: it thinks a lot, really a lot, to reach its conclusions and reply to the user.
English
5
0
47
5.4K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Adding support for model type step3p5 to MLX using Codex, MLX Skill and @RepoPrompt let's try! 🚀
Ivan Fioravanti ᯅ tweet media
English
2
0
45
3.3K