StrongEngineer_

7.2K posts

StrongEngineer_ banner
StrongEngineer_

StrongEngineer_

@hotschmoe

Christian • Father of 4 • Structural Engineer • e/acc • BTB Jungle Lurker • too many labels

Desert Southwest, USA เข้าร่วม Ekim 2021
556 กำลังติดตาม865 ผู้ติดตาม
ทวีตที่ปักหมุด
StrongEngineer_
StrongEngineer_@hotschmoe·
MAKE ENGINEERS GREAT AGAIN "A good engineer gets stale very fast if he doesn't keep his hands dirty." - Wernher von Braun
English
2
0
27
2.5K
StrongEngineer_ รีทวีตแล้ว
StrongEngineer_
StrongEngineer_@hotschmoe·
@DeeperThrill I'm running local models so I can control how my children first interact with AI. It's all under my roof
English
1
1
0
26
StrongEngineer_
StrongEngineer_@hotschmoe·
btw im moving to linux kernel 7.0+ to test some better GPU P2P pieces that are missing from earlier kernels to increase Tp=2+ performance
English
0
0
1
21
StrongEngineer_
StrongEngineer_@hotschmoe·
Kernel: 6.18.33 (im on Unraid) GuC firmware: xe/bmg_guc_70.bin version 70.65.0 the rest is in the VLLM container vLLM 0.23.0 vllm-xpu-env:v0230 Qwen3.6-27B int4 AutoRound (w4a16) for qwen3.6 you need a gdn_attention XPU kernel, you can view my github, @xyster git repos, and localmaxxing configs
Jonathan Staples@staples46198

@hotschmoe Do you mind me asking what your stack is? I just tried vLLM and same thing. I wedged after a 55 second sustained load. 1) Kernel version (uname -r) 2) GuC firmware version 3) oneAPI/Level-Zero version I've got to be missing something here.

English
1
0
2
310
mr-r0b0t
mr-r0b0t@mr_r0b0t·
FWIW Two of these get you a very capable 32GB of VRAM 👀
mr-r0b0t tweet media
English
21
0
53
6.6K
Sudo su
Sudo su@sudoingX·
hey, i'm sudo. and i have opinions.
English
11
0
44
2.3K
StrongEngineer_
StrongEngineer_@hotschmoe·
@staples46198 im running vLLM + Pi have got 100k+ context with no problems, and 3x streaming running for 10min+ each no problems for B70s, vLLM was really the only backend that interested me, I havent look at anything else
English
2
0
0
31
Jonathan Staples
Jonathan Staples@staples46198·
Multiple harnesses - Cursor, Cline, even Llama-swap. I've got a bug out to Llamma.cpp and the XE Kernel. Seems to be right around ~4,000 tokens or ~70 seconds of sustained load that it wedges. Seems to wedge faster on MOE than Dense models - so not sure if it's more about the model or the tps or working time. It will wedge on both coding work and simple text generation. gitlab.freedesktop.org/drm/xe/kernel/… github.com/ggml-org/llama…
English
1
0
1
32
StrongEngineer_
StrongEngineer_@hotschmoe·
Intel arc B70, pulls ~200w, one easy 8-pin connection. short, 2-slot, blower style, $950 unoptimized (software maturity miles away from cuda still, so there head room) quick setup with in vllm for 27B W4A16 pp 1521.7 t/s · TTFT 92.3 ms · decode 31.36 t/s that. is. usable.
English
4
1
21
2.6K
Steeve Morin
Steeve Morin@steeve·
okay hear me out: CPU -> PCI card spoofing VFs -> 400GB network link -> PCI card -> PCIe Gen 5 switch -> GPUs running inference, GPUs have fast TP due to the PCIe switch, and only CPU -> GPU goes through the network using SR-IOV, the emulation could be transparent to the host
Steeve Morin@steeve

software defined PCIe bus?

English
2
0
5
440
StrongEngineer_ รีทวีตแล้ว
Steeve Morin
Steeve Morin@steeve·
brb intelmaxxing b70
Steeve Morin tweet media
Indonesia
52
32
811
49.6K
StrongEngineer_
StrongEngineer_@hotschmoe·
thats what we like to see boys
StrongEngineer_ tweet media
English
1
0
6
1.1K
定
@de3dsoul·
Back when weekends looked like this
定 tweet media
English
56
488
7.5K
192.7K
Jonathan Staples
Jonathan Staples@staples46198·
@hotschmoe I can't seem to get mine to run stable. It's fine for short burst work. Anything over ~3 minutes and it wedges. The MOE models seem to wedge faster than the dense models too.
English
1
0
1
18
StrongEngineer_
StrongEngineer_@hotschmoe·
I just hiked Superstition Ridgeline, ends coming down flatiron right there
English
0
0
0
28
StrongEngineer_
StrongEngineer_@hotschmoe·
@mattforney I just hiked Superstition Ridgeline, ends coming down flatiron right there
English
0
0
1
179
Matt Forney
Matt Forney@mattforney·
Imagine hating America your entire life because the BBC told you to only to discover amazing food, beautiful sights, and friendly people when you actually come here. This is the greatest cultural turnaround since Americans started loving the Japanese in the 80s.
American Nightmare 🇺🇸@thewakeninq

The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸

English
12
46
742
17.8K
American Nightmare 🇺🇸
The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸
English
394
3.3K
25.6K
441.5K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀 Decode speed: 17.9 tokens/sec 🔥 Memory used: ~ 760GB 👀 Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!
Ivan Fioravanti ᯅ tweet media
English
18
12
168
16.9K
Serf
Serf@TheRoyalSerf·
Explain your politics with a gif image or video
English
1.2K
21
783
630K