Simon Willison

60.3K posts

Simon Willison banner
Simon Willison

Simon Willison

@simonw

Creator @datasetteproj, co-creator Django. PSF board. Hangs out with @natbat. He/Him. Mastodon: https://t.co/t0MrmnJW0K Bsky: https://t.co/OnWIyhX4CH

San Francisco, CA เข้าร่วม Kasım 2006
5.6K กำลังติดตาม152.1K ผู้ติดตาม
Simon Willison
Simon Willison@simonw·
@natolambert Did we ever get a conclusive answer as to if their top researchers quit or were fired?
English
3
0
14
8.1K
Simon Willison
Simon Willison@simonw·
Dan found that the 2-bit quantization broke tool calling but upgrading to 4-bit (at 4.36 tokens/second) got that working
Dan Woods@danveloper

@simonw You bet. Literally, "tool calling" became the metric that got us back to Q4. Q2 was really great conversationally and very capable, but it's like running the model at temperature 10,000 for anything predictable.

English
8
1
31
8.2K
Simon Willison
Simon Willison@simonw·
Dan says he's got Qwen 3.5 397B-A17B - a 209GB on disk MoE model - running on an M3 Mac at ~5.7 tokens per second using only 5.5 GB of active memory (!) by quantizing and then streaming weights from SSD (at ~17GB/s), since MoE models only use a small subset of their weights for each token
Dan Woods@danveloper

x.com/i/article/2034…

English
85
170
1.8K
234.1K
Simon Willison
Simon Willison@simonw·
@danveloper Have you observed a meaningful difference between Q4 and Q2 either when it comes to tool calling? Would love to see how you measure that
English
1
0
1
162
Dan Woods
Dan Woods@danveloper·
And now, I'm done with MoE's on this project forever. There probably is room to get to 6-8 tok/s, which even at 4 tok/s it's very usable for agentic tasks that are not time sensitive, and Q4 weights make the agent tool calls predictably reliable. Qwen 3.5 is an excellent model.
English
2
0
14
589
Dan Woods
Dan Woods@danveloper·
Some very meaningful progress on this project. A bunch of performance experiments and we've landed at 4.4 tok/s on the distribution Q4 weights. Feels pretty good since we started at 0.28tok/s. Code and experiments are up in the github repo now!
Dan Woods tweet media
Dan Woods@danveloper

x.com/i/article/2034…

English
6
4
54
6.4K
Simon Willison
Simon Willison@simonw·
@FixTechStuff1 That doesn't matter in this case because it's effectively a read-only workload - all if that read activity shouldn't hurt the SSD at all
English
1
0
34
1.8K
FixTechStuff 🛠️
FixTechStuff 🛠️@FixTechStuff1·
@simonw One problem with hammering your SSD like this is SSD’s have a finite number of writes. This is fine if SSD’s are cheap and replaceable, but when it’s hard soldered to your Mac mini, then you’ll eventually have to replace the whole thing.
English
1
0
13
1.8K
fallpeak
fallpeak@_fallpeak·
@simonw It feels misleading to report "5.5 tok/s" up top and then hide a "(with less than half the usual expert count)" multiple paragraphs away. I guess in some sense it's no more misleading than using a quant at all, but it feels different somehow
English
1
0
1
383
Dan Woods
Dan Woods@danveloper·
@simonw Empirical with Opus doing the sanity checking. I’m not sure 2-bit quantization even mattered that much in the end… it was an earlier test, so I’ll probably revert that and see how it does with regular 4-bit. The k=4 was a binary search by Claude, checking the quality each time.
English
3
0
10
2.8K
Simon Willison
Simon Willison@simonw·
@NoeFlandre SVG is a little more useful, I actually have models produce a SVG for real web features sometimes
English
1
0
3
112
Noé Flandre
Noé Flandre@NoeFlandre·
@simonw Why SVG Pelicans and not their TIKZ siblings btw?
English
1
0
1
105
Simon Willison
Simon Willison@simonw·
@ClementDelangue Qwen 3.5 was shockingly good, even at tiny sizes like the 4B model (which somehow benchmarks similar to GPT-4o across many of the classic benchmarks)... and then much of the core Qwen team quit or were fired (still not clear to me which) straight after releasing it
English
5
7
149
7.2K
clem 🤗
clem 🤗@ClementDelangue·
1. What were the most important/interesting developments in AI, Hugging Face, or the world since January that I should know about?
English
11
2
45
12.7K
clem 🤗
clem 🤗@ClementDelangue·
Just sent these questions to the HF team after paternity leave - would love the community's take too 👇
English
8
4
120
40.2K
Simon Willison
Simon Willison@simonw·
@cyrusradfar @GergelyOrosz "In the end we're all communicating" Not if our AI assistant made the decision to reply to something and then wrote and posted a reply
English
2
0
7
100
Cyrus Radfar
Cyrus Radfar@cyrusradfar·
In the end we're all communicating. I'm not clear on why the effort put in matters. It's whether the message comes through. To discount the message because it was AI supported, feels unfair. Tech leaders don't write their social posts, but we don't say "they didn't write that" -- we just take it , respond or react as we like. The method of creation is irrelevant. I get that we all don't want slop, but that existed well before AI online -- especially in online "discourse."
English
9
0
5
1.1K
Gergely Orosz
Gergely Orosz@GergelyOrosz·
It’s not X — it’s Y I cannot unsee how so much of the writing on this site (and online, in general) is increasingly AI-generated. It’s still pretty easy to recognize. Probably not for long tho Just alarming that ppl outsource even typing 3 sentences for a reply on this site…
English
154
33
1.2K
44.8K
Simon Willison
Simon Willison@simonw·
@ryanjanssen Given how good their Claude Code for web hosted version is I would be shocked not to see a hosted Claude Cowork from them soon
English
0
0
4
279
Ryan Janssen
Ryan Janssen@ryanjanssen·
@simonw these are all bandaids on their main problem (that CC is natively local) I’m interested how they’ll address the underlying need for cloud
English
3
0
0
548
Simon Willison
Simon Willison@simonw·
Couldn't resist getting OpenAI Codex to render me a pelican for every combination of model and reasoning effort - I do think gpt-5.4 xhigh came out the best, the pelican has a fish in its beak!
Simon Willison tweet media
English
10
5
76
10.7K
Simon Willison
Simon Willison@simonw·
New chapter for Agentic Engineering Patterns: I tried to distill key details of how coding agents work under the hood that are most useful to understand in order to use them effectively simonwillison.net/guides/agentic…
English
47
74
745
53.8K