mzba

1.4K posts

mzba banner
mzba

mzba

@LiMzba

Katılım Temmuz 2012
234 Takip Edilen1.3K Takipçiler
mzba
mzba@LiMzba·
@zcbenz kind of not even wants to open anymore, either pure marketing or get copied for nothing 😞
English
0
0
0
76
Cheng
Cheng@zcbenz·
But seriously speaking, this is the worst time for building anything. Your work would be copied and buried by slops and hypes, and traditional software monetization methods are no longer working, all recent successful startup exits were just acquihires.
English
7
1
58
4.3K
Awni Hannun
Awni Hannun@awnihannun·
I joined Anthropic as a member of the technical staff. Excited to work on frontier modeling at a place with unwavering values and a generational mission.
English
208
39
2.3K
120.5K
mzba
mzba@LiMzba·
@awnihannun @atiorh a lot of great insights, strong agree the ASR will be frontend to LLM🙂
English
0
0
2
125
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
LTX 2.3 22B Distilled M5 Max 40 GPU cores WINS vs M3 Ultra 80 GPU cores in generation of a 5 seconds video: 🥇 M5 Max 121 secs 🥈 M3 Ultra 206 secs I have used again Draw Things App for this test with default settings. Guess which video is from which machine, Audio on!
English
12
2
49
3.6K
mzba
mzba@LiMzba·
@anemll With the M5 the prefill is still slower than ANE? Tried the fixed size ANE processing still gets slower end to end generation on m2..
English
0
0
0
52
Anemll
Anemll@anemll·
@LiMzba Every time we send a big chunk of prompt to Claude or Codex, it is compute-bound, no? A lot of agentic exchanges are heavy on compute. Just a few examples.
English
1
0
1
216
Anemll
Anemll@anemll·
No surprises for ANE bandwidth on the M4 Max. It's about a quarter of overall available, which makes the M5 Max the fastest ANE in memory throughput. ~150GB/s
Anemll tweet media
English
5
8
81
5.7K
mzba
mzba@LiMzba·
@ivanfioravanti A very good time to try something new in 2026 :)
English
0
0
2
66
mzba
mzba@LiMzba·
Very cool way to connect the agent with the browser, kind of the old way frontend developers used to bypass authentication :p
yan5xu@yan5xu

😅嗯,bb-browser,badboy browser,坏孩子浏览器来了,真的很丧良心,但真的很好用。 现在你可以用 bb-browser site 的方式直接拉到任何网站的信息,目前支持 Reddit、Twitter、GitHub、Hacker News、小红书、知乎、B站、微博、豆瓣、YouTube,50+ 个命令,我会持续更新。 当然能做到信息获取这件事不稀奇,我也是看到 @jakevin7 的 twitter-cli 的启发,才做的。但 bb-browser 的实现方式非常丧良心 — 我是通过 Chrome 插件 + CDP 直接操控你真实的浏览器。不是无头浏览器,不是偷 Cookie,不是模拟请求。你已登录了,它就直接用你的登录态。它直接在浏览器 console 里面跑 eval,以前爬虫最麻烦的登录态、还有各种鉴权都没有了😂。(这种方式真的。。。太作弊了,我都能想到哪些大厂前端发现我在这么搞,会怎么骂我,因为真的很难防) 另外我还在命令行里面埋了 guide 命令,也就是说你只要装了 bb-browser CLI 或 MCP,跟你的 Agent 说"我需要把 XX 网站 CLI 化",它就能帮你做了!!

English
0
0
0
397
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
LinkedIn is so terrible! It’s beyond cringe! Why most people love to appear so dumb publicly???
English
11
0
41
2.5K
mzba
mzba@LiMzba·
@angeloskath @ivanfioravanti A side topic is there a reason we can't do awq quant for qwen 3.5?, my Macs are occupied at the moment, otherwise really want to try awq quant
English
0
0
0
61
Angelos Katharopoulos
Angelos Katharopoulos@angeloskath·
@ivanfioravanti I am curious if there is something wrong with this particular 4-bit. I ran GSM8K yesterday night with mlx_lm.evaluate and I get bf16: 91.6 8.5bpw: 91.1 6.5bpw: 89.8 5.0bpw: 86.8 4.5bpw: 73.0 Already the 4 bit with group size 32 (5.0bpw) is so much better than ParoQuant 🤷‍♂️
English
3
0
6
645
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
MLX 4bit vs MLX ParoQuant 4bit using Qwen3.5-9B 📣 As you can see below there is no match. I will try to do same with 8bit in next days to do a comparison. ParoQuant is my new go to quantization below 8bit! I have limited max-tokens in some cases, but the important thing is that same limits have been applied to both quantizations.
Ivan Fioravanti ᯅ tweet media
English
18
13
134
19.2K
Liu Liu
Liu Liu@liuliu·
When everyone is a micromanager now, I wish the Greatest Micromanager of All Time @JeffBezos would share his SKILL.md
English
1
1
4
417
mzba
mzba@LiMzba·
@Prince_Canuma Nice, once you have done the MLX port, I can have a good reference in MLX implementation:)
English
1
0
1
55
Prince Canuma
Prince Canuma@Prince_Canuma·
@LiMzba Both distilled and dev pipelines ? I’m getting 35-40GB without loras for both distilled and dev using default settings But I have optimisations for decoding that allow you to run it even on M1 16GB
English
1
0
1
32
Prince Canuma
Prince Canuma@Prince_Canuma·
MLX-Video updates coming soon 🚀 Took a bit longer to debug LTX-2 dev model and 2.3 but we are almost there! Note: PR is tbd, so bear with me :) github.com/Blaizzy/mlx-vi…
English
5
4
47
2.4K
mzba
mzba@LiMzba·
@Prince_Canuma I'm still in the middle of porting, but it looks ok with a cache limit to 2gb, full pipeline used around 50+GB memory. But I can't run the official repo with mps on m2 128gb, got memory usage around 168gb+, there must be something wrong there
English
2
0
0
40
LotusDecoder
LotusDecoder@LotusDecoder·
又搞了一轮实测出发的对比实验, 算是大概整明白了, 主要原因 :Qwen3.5 架构太新了, MLX 的量化落后, 跑 claude code, MLX 4bit 和 8bit 都不咋行。 还是得用 GGUF 啊。 现在结论是: 编程生产力用 GGUF 量化。 追求速度用 MLX 量化。
LotusDecoder tweet mediaLotusDecoder tweet mediaLotusDecoder tweet media
LotusDecoder@LotusDecoder

还是算了,qwen3.5低档model不适用于 claude code 的生产力。 MLX/Qwen3.5-35B-8bit , 虽然速度快,也有点智能, 但是不大适用于 claude code, 很容易退化。 大概13轮开始吧。 而一个任务,read write bash 来五六次 tool很常见,基本等于很难用于生产了。

中文
11
5
146
38.5K
mzba
mzba@LiMzba·
@ronaldmannak I don't use MCP anymore, feel it was an unnecessary wrapper around the API, and personally, I think the CLI approach offers the best balance between control and flexibility.
English
0
0
1
53
mzba
mzba@LiMzba·
@yan5xu 我在去看看中文原文 :)
日本語
0
0
0
21
yan5xu
yan5xu@yan5xu·
@LiMzba 里面有发现,command 组合的优势呀
中文
1
0
1
87
yan5xu
yan5xu@yan5xu·
这文章英文版在 LocalLLaMA 已经上今日首页第三。。。为啥中文版,无论推还是微信公众号,都没人回复呢🧐
yan5xu tweet media
yan5xu@yan5xu

x.com/i/article/2031…

中文
15
4
52
14.3K