StrongEngineer_
7.2K posts

StrongEngineer_
@hotschmoe
Christian • Father of 4 • Structural Engineer • e/acc • BTB Jungle Lurker • too many labels
Desert Southwest, USA เข้าร่วม Ekim 2021
556 กำลังติดตาม865 ผู้ติดตาม
ทวีตที่ปักหมุด
StrongEngineer_ รีทวีตแล้ว

@DeeperThrill I'm running local models so I can control how my children first interact with AI. It's all under my roof
English

Kernel: 6.18.33 (im on Unraid)
GuC firmware: xe/bmg_guc_70.bin version 70.65.0
the rest is in the VLLM container
vLLM 0.23.0
vllm-xpu-env:v0230
Qwen3.6-27B int4 AutoRound (w4a16)
for qwen3.6 you need a gdn_attention XPU kernel, you can view my github, @xyster git repos, and localmaxxing configs
Jonathan Staples@staples46198
@hotschmoe Do you mind me asking what your stack is? I just tried vLLM and same thing. I wedged after a 55 second sustained load. 1) Kernel version (uname -r) 2) GuC firmware version 3) oneAPI/Level-Zero version I've got to be missing something here.
English
StrongEngineer_ รีทวีตแล้ว

The doom justifies the valuation geohot.github.io//blog/jekyll/u…
English

@staples46198 im running vLLM + Pi
have got 100k+ context with no problems, and 3x streaming running for 10min+ each no problems
for B70s, vLLM was really the only backend that interested me, I havent look at anything else
English

Multiple harnesses - Cursor, Cline, even Llama-swap. I've got a bug out to Llamma.cpp and the XE Kernel. Seems to be right around ~4,000 tokens or ~70 seconds of sustained load that it wedges.
Seems to wedge faster on MOE than Dense models - so not sure if it's more about the model or the tps or working time.
It will wedge on both coding work and simple text generation.
gitlab.freedesktop.org/drm/xe/kernel/…
github.com/ggml-org/llama…
English

okay hear me out:
CPU -> PCI card spoofing VFs -> 400GB network link -> PCI card -> PCIe Gen 5 switch -> GPUs
running inference, GPUs have fast TP due to the PCIe switch, and only CPU -> GPU goes through the network
using SR-IOV, the emulation could be transparent to the host
Steeve Morin@steeve
software defined PCIe bus?
English

StrongEngineer_ รีทวีตแล้ว
StrongEngineer_ รีทวีตแล้ว

@staples46198 What harness? Qwen 3.6-27b in Pi has had no issues for me so far
English

@hotschmoe I can't seem to get mine to run stable. It's fine for short burst work. Anything over ~3 minutes and it wedges. The MOE models seem to wedge faster than the dense models too.
English

@mattforney I just hiked Superstition Ridgeline, ends coming down flatiron right there
English

Imagine hating America your entire life because the BBC told you to only to discover amazing food, beautiful sights, and friendly people when you actually come here. This is the greatest cultural turnaround since Americans started loving the Japanese in the 80s.
American Nightmare 🇺🇸@thewakeninq
The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸
English


@ivanfioravanti @pcuenq Compared to a server rack of day they're small! Hah
English

GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀
Decode speed: 17.9 tokens/sec 🔥
Memory used: ~ 760GB 👀
Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!

English














