mzba

1.4K posts

mzba

@LiMzba

Katılım Temmuz 2012

234 Takip Edilen1.3K Takipçiler

mzba retweetledi

Dan Woods@danveloper·18 Mar

x.com/i/article/2034…

ZXX

186

1.3K

650.9K

mzba@LiMzba·23 Mar

@zcbenz kind of not even wants to open anymore, either pure marketing or get copied for nothing 😞

English

Cheng@zcbenz·22 Mar

But seriously speaking, this is the worst time for building anything. Your work would be copied and buried by slops and hypes, and traditional software monetization methods are no longer working, all recent successful startup exits were just acquihires.

English

4.3K

mzba retweetledi

Angelos Katharopoulos@angeloskath·19 Mar

Awesome deep dive on MLX-LM tools!

Shashank Prasanna@shshnkp

My favorite (and often overlooked) MLX feature is the 17 (and growing) CLI tools part of mlx-lm. You can do so much with a single line of code! Here's a quick overview 🧵 🚀 Inference, 🍦 Serving 🎯 Fine-tuning, ⚡ Quantization 📊 Evaluation & Benchmarking 🤗 Sharing and model management

English

16.1K

mzba@LiMzba·19 Mar

@awnihannun wow, congratulations

English

476

Awni Hannun@awnihannun·19 Mar

I joined Anthropic as a member of the technical staff. Excited to work on frontier modeling at a place with unwavering values and a generational mission.

English

208

2.3K

120.5K

mzba@LiMzba·18 Mar

@awnihannun @atiorh a lot of great insights, strong agree the ASR will be frontend to LLM🙂

English

125

Awni Hannun@awnihannun·17 Mar

Fun conversation with @atiorh. We covered a lot of ground from Deep Speech to MLX and more:

Atila@atiorh

I hosted @awnihannun on localhost, co-creator of MLX and member of the Deep Speech mafia. Enjoy! Apple Podcasts: podcasts.apple.com/us/podcast/loc… Spotify: open.spotify.com/episode/01Q0Re…

English

19.3K

mzba@LiMzba·18 Mar

@ivanfioravanti Full precision model? Looks very nice

English

Ivan Fioravanti ᯅ@ivanfioravanti·17 Mar

LTX 2.3 22B Distilled M5 Max 40 GPU cores WINS vs M3 Ultra 80 GPU cores in generation of a 5 seconds video: 🥇 M5 Max 121 secs 🥈 M3 Ultra 206 secs I have used again Draw Things App for this test with default settings. Guess which video is from which machine, Audio on!

English

3.6K

mzba@LiMzba·18 Mar

@anemll With the M5 the prefill is still slower than ANE? Tried the fixed size ANE processing still gets slower end to end generation on m2..

English

Anemll@anemll·18 Mar

@LiMzba Every time we send a big chunk of prompt to Claude or Codex, it is compute-bound, no? A lot of agentic exchanges are heavy on compute. Just a few examples.

English

216

Anemll@anemll·17 Mar

No surprises for ANE bandwidth on the M4 Max. It's about a quarter of overall available, which makes the M5 Max the fastest ANE in memory throughput. ~150GB/s

English

5.7K

mzba@LiMzba·17 Mar

@ivanfioravanti A very good time to try something new in 2026 :)

English

Ivan Fioravanti ᯅ@ivanfioravanti·16 Mar

x.com/i/article/2033…

ZXX

127

9.1K

mzba@LiMzba·15 Mar

Very cool way to connect the agent with the browser, kind of the old way frontend developers used to bypass authentication :p

yan5xu@yan5xu

😅嗯，bb-browser，badboy browser，坏孩子浏览器来了，真的很丧良心，但真的很好用。现在你可以用 bb-browser site 的方式直接拉到任何网站的信息，目前支持 Reddit、Twitter、GitHub、Hacker News、小红书、知乎、B站、微博、豆瓣、YouTube，50+ 个命令，我会持续更新。当然能做到信息获取这件事不稀奇，我也是看到 @jakevin7 的 twitter-cli 的启发，才做的。但 bb-browser 的实现方式非常丧良心 — 我是通过 Chrome 插件 + CDP 直接操控你真实的浏览器。不是无头浏览器，不是偷 Cookie，不是模拟请求。你已登录了，它就直接用你的登录态。它直接在浏览器 console 里面跑 eval，以前爬虫最麻烦的登录态、还有各种鉴权都没有了😂。（这种方式真的。。。太作弊了，我都能想到哪些大厂前端发现我在这么搞，会怎么骂我，因为真的很难防）另外我还在命令行里面埋了 guide 命令，也就是说你只要装了 bb-browser CLI 或 MCP，跟你的 Agent 说"我需要把 XX 网站 CLI 化"，它就能帮你做了！！

English

397

mzba@LiMzba·14 Mar

@ivanfioravanti I use it to annoy people there:p

English

159

Ivan Fioravanti ᯅ@ivanfioravanti·14 Mar

LinkedIn is so terrible! It’s beyond cringe! Why most people love to appear so dumb publicly???

English

2.5K

mzba@LiMzba·14 Mar

@angeloskath @ivanfioravanti A side topic is there a reason we can't do awq quant for qwen 3.5?, my Macs are occupied at the moment, otherwise really want to try awq quant

English

Angelos Katharopoulos@angeloskath·13 Mar

@ivanfioravanti I am curious if there is something wrong with this particular 4-bit. I ran GSM8K yesterday night with mlx_lm.evaluate and I get bf16: 91.6 8.5bpw: 91.1 6.5bpw: 89.8 5.0bpw: 86.8 4.5bpw: 73.0 Already the 4 bit with group size 32 (5.0bpw) is so much better than ParoQuant 🤷‍♂️

English

645

Ivan Fioravanti ᯅ@ivanfioravanti·13 Mar

MLX 4bit vs MLX ParoQuant 4bit using Qwen3.5-9B 📣 As you can see below there is no match. I will try to do same with 8bit in next days to do a comparison. ParoQuant is my new go to quantization below 8bit! I have limited max-tokens in some cases, but the important thing is that same limits have been applied to both quantizations.

English

134

19.2K

mzba@LiMzba·14 Mar

@liuliu @JeffBezos LOL

Liu Liu@liuliu·13 Mar

When everyone is a micromanager now, I wish the Greatest Micromanager of All Time @JeffBezos would share his SKILL.md

English

417

mzba@LiMzba·13 Mar

@Prince_Canuma Nice, once you have done the MLX port, I can have a good reference in MLX implementation:)

English

Prince Canuma@Prince_Canuma·13 Mar

@LiMzba Both distilled and dev pipelines ? I’m getting 35-40GB without loras for both distilled and dev using default settings But I have optimisations for decoding that allow you to run it even on M1 16GB

English

Prince Canuma@Prince_Canuma·13 Mar

MLX-Video updates coming soon 🚀 Took a bit longer to debug LTX-2 dev model and 2.3 but we are almost there! Note: PR is tbd, so bear with me :) github.com/Blaizzy/mlx-vi…

English

2.4K

mzba@LiMzba·13 Mar

@Prince_Canuma I'm still in the middle of porting, but it looks ok with a cache limit to 2gb, full pipeline used around 50+GB memory. But I can't run the official repo with mps on m2 128gb, got memory usage around 168gb+, there must be something wrong there

English

Prince Canuma@Prince_Canuma·13 Mar

@LiMzba But the memory usage is insane 🤣

English

mzba@LiMzba·13 Mar

@ivanfioravanti @awnihannun @LotusDecoder @angeloskath @kernelpool Not sure if it's 100% quant issue, I was using the full precision model and I still get random tool call issues

English

Ivan Fioravanti ᯅ@ivanfioravanti·12 Mar

@awnihannun @LotusDecoder @angeloskath @kernelpool ParoQuant improves this quite a lot.

English

182

LotusDecoder@LotusDecoder·11 Mar

又搞了一轮实测出发的对比实验，算是大概整明白了，主要原因：Qwen3.5 架构太新了， MLX 的量化落后，跑 claude code， MLX 4bit 和 8bit 都不咋行。还是得用 GGUF 啊。现在结论是：编程生产力用 GGUF 量化。追求速度用 MLX 量化。

LotusDecoder@LotusDecoder

还是算了，qwen3.5低档model不适用于 claude code 的生产力。 MLX/Qwen3.5-35B-8bit ，虽然速度快，也有点智能，但是不大适用于 claude code，很容易退化。大概13轮开始吧。而一个任务，read write bash 来五六次 tool很常见，基本等于很难用于生产了。

中文

146

38.5K

mzba@LiMzba·13 Mar

@ronaldmannak I don't use MCP anymore, feel it was an unnecessary wrapper around the API, and personally, I think the CLI approach offers the best balance between control and flexibility.

English