Andrew Zhu

2.9K posts

Andrew Zhu banner
Andrew Zhu

Andrew Zhu

@xhinker

on Personal AI | open source contributor | startup founder member | ex-Microsoft blog: https://t.co/QjnjnNVoCD

WA, US 参加日 Ağustos 2009
250 フォロー中265 フォロワー
Andrew Zhu
Andrew Zhu@xhinker·
@LottoLabs Today, 27B finished a super complex code change, involves 7 code files, hundreds lines of code, and Qwen3.6-27B GOT id DONE, and running well. Previously, CC and Codex may fail on similar task, super super amazing
English
0
0
1
27
Andrew Zhu
Andrew Zhu@xhinker·
llama.cpp cli: llama-server \ -m /path/to/.lmstudio/models/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf \ --mmproj /path/to/.lmstudio/models/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/mmproj-F32.gguf \ --alias unsloth/Qwen3.6-35B-A3B-MTP-GGUF \ --host 0.0.0.0 \ -ngl 99 \ --batch-size 4096 \ --ubatch-size 1024 \ --flash-attn on \ --jinja \ --mlock \ --metrics \ --temp 0.5 \ --ctx-size 262144 \ --parallel 1 \ --image-min-tokens 1024 \ --reasoning-format none \ --timeout 1200 \ --port 8082 \ --spec-type draft-mtp,ngram-mod \ --spec-draft-n-max 2 \ --spec-draft-n-min 0 \ --spec-ngram-mod-n-match 24 \ --spec-ngram-mod-n-min 48 \ --spec-ngram-mod-n-max 64
Indonesia
0
0
0
45
Andrew Zhu
Andrew Zhu@xhinker·
The old M1Max Macbook Pro is still workable, got 60t/s from Qwen3.6-34B-A3B-MTP , impressive!
Andrew Zhu tweet media
English
1
0
0
43
Lotto
Lotto@LottoLabs·
Hey apple bros is M1 at all viable for local models or too slow?
Lotto tweet media
English
43
0
37
10.4K
Andrew Zhu
Andrew Zhu@xhinker·
@wesleimade llama-server \ -m /Users/andrewzhu/.lmstudio/models/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf \ --mmproj /Users/andrewzhu/.lmstudio/models/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/mmproj-F32.gguf \ --alias unsloth/Qwen3.6-35B-A3B-MTP-GGUF \ --host 0.0.0.0 \ -ngl 99 \ --batch-size 1024 \ --ubatch-size 512 \ --flash-attn on \ --jinja \ --mlock \ --metrics \ --ctx-size 262144 \ --parallel 2 \ --kv-unified \ --image-min-tokens 1024 \ --reasoning-format none \ --timeout 1200 \ --ctx-checkpoints 32 \ --spec-type draft-mtp \ --spec-draft-n-max 2 \ --port 8082
Indonesia
1
0
0
33
AI✖️Satoshi⏩️
AI✖️Satoshi⏩️@AiXsatoshi·
llama.cpp tensor parallelテスト Qwen3.6-27B-Q4_K_M  1 x GPU → 65tok/s  2 x GPU → 98tok/s
日本語
6
1
22
4.1K
Andrew Zhu
Andrew Zhu@xhinker·
Self improve AI is nothing new, Anthrophic, again, stealing other people's idea, entitle for themself.
English
0
0
0
26
Andrew Zhu
Andrew Zhu@xhinker·
@ZeroZ_JQ 亲测有效, 我的AI 小伙伴表示非常开心, 简直就是随心所欲, 发发命中
中文
0
0
0
141
Andrew Zhu
Andrew Zhu@xhinker·
@ZeroZ_JQ 我给你一个第五选项, HTML+JS+CSS, 亲测非非常好, 这些框架以前都是要么补足人的不足, 或者适应团队开发, 现在还用这些干啥, 又慢, bug 有多, 是时候回归原本了
中文
1
0
0
2K
关木
关木@ZeroZ_JQ·
AI 给我提出了史诗级的难题,我犹豫了半小时了
关木 tweet media
中文
108
5
225
85.6K
Andrew Zhu
Andrew Zhu@xhinker·
@0xSero make sense, thank you, looks like it is the only way out
English
0
0
0
313
0xSero
0xSero@0xSero·
@xhinker You'd plug it in to the washing machine outlet, our spread it over a few circuits.
English
2
0
8
327
0xSero
0xSero@0xSero·
I am one of the 10 most knowledgable people in self hosting in residential areas. I have pretty much done everything there is to do, I wonder how valuable my version of a train model obsession will be. - intel - AMD - Apple - spark / 3090s / 6000s Cooling, power management
0xSero tweet media0xSero tweet media0xSero tweet media
English
67
2
377
23.1K
Andrew Zhu
Andrew Zhu@xhinker·
@0xSero I know, I am doing so now, but every US wall socket have a Waltage limitation, otherwise the fuse will be tripped. how do you solve the single socket waltage limitation? use multple wall sockets?
English
2
0
1
324
Jackywine
Jackywine@Jackywine·
今天 Anthropic 这篇文章,被所有人转发了 anthropic.com/institute/recu… 但是只有真正去官网看的人才会体会到,这段动画的“恐怖感” 递归开始了
中文
58
82
612
174.6K
Andrew Zhu
Andrew Zhu@xhinker·
@jianshuo 干这些小活, 完全不用cc , 本地 qwen3.6-27B 从未失手
中文
0
0
0
81
Jianshuo Wang
Jianshuo Wang@jianshuo·
Claude Code真是整理神器,用它整理文件夹,整理Photos里的照片,整理百度网盘,简直太爽了,陈年老文件得救了
中文
3
0
2
1.1K
Andrew Zhu
Andrew Zhu@xhinker·
while NVFP4 is faster in terms of prefill, my practical experience shows, Int4, FP4 or Q4 will give stupid results when handling complex task, and some time even random XML tags that break everything when tool call reach to 50 times. Q6 don't have these problem, so, stick with Q6 for now
English
0
0
0
190
witcheer
witcheer@witcheer·
everyone says NVFP4 makes blackwell cards "faster." I benchmarked Qwen3.6-27B three ways on my 5090: >NVFP4 >plain Q4_K_M (same 4-bit budget) >Q6_K - same llama.cpp b9365 and same harness. ~~~ prefill (processing your prompt): NVFP4 wins big, and it's real. +32 to 42% over equal-bit Q4_K_M at every context from 512 to 16k, so that gain is pure FP4-tensor-core compute. vs Q6 it's +52 to 68%. concretely at pp512: 5415 tok/s vs 3826 (Q4) vs 3222 (Q6). ~~~ decode (generating tokens): here's the myth. vs an equal-size Q4 it moves only +9% (84 vs 77 tok/s). the headline "+36% vs Q6" decode number isn't the FP4 cores at all but it's just NVFP4 being smaller (14.6GB vs 21GB). decode is memory-bandwidth bound, so it tracks footprint, not how the weights are packed. prefill = compute, decode = size. ~~~ the 4-bit tax is almost nothing: 93.2 vs 94.0 q_avg across five tasks vs Q6. MMLU, ARC, HellaSwag, GSM8K all land within half a point; only code dips meaningfully (HumanEval 90.2 vs 92.7). net, vs the Q6 a lot of people serve: ~+60% prefill +36% decode -30% VRAM (17.3 vs 23.5GB) for -0.8 quality. for an always-on local agent that's an easy yes - faster replies, more context headroom, and 6GB of VRAM handed back.
witcheer tweet media
English
17
8
111
16.4K
Andrew Zhu がリツイート
BosonAI
BosonAI@boson_ai·
Higgs Audio v3 TTS is here. Built for voice AI that speaks, not just reads: • 100 languages with single-digit WER/CER • inline control over emotion, style, prosody, and sound effects • API, Workspace, and open weights • Blog 👉 boson.ai/blog/higgs-aud… Watch the demo 👇
English
13
58
365
41.8K