StrongEngineer_

7.2K posts

StrongEngineer_

@hotschmoe

Christian • Father of 4 • Structural Engineer • e/acc • BTB Jungle Lurker • too many labels

Desert Southwest, USA เข้าร่วม Ekim 2021

556 กำลังติดตาม865 ผู้ติดตาม

ทวีตที่ปักหมุด

StrongEngineer_@hotschmoe·12 Ağu

MAKE ENGINEERS GREAT AGAIN "A good engineer gets stale very fast if he doesn't keep his hands dirty." - Wernher von Braun

English

2.5K

StrongEngineer_ รีทวีตแล้ว

StrongEngineer_@hotschmoe·1d

@DeeperThrill I'm running local models so I can control how my children first interact with AI. It's all under my roof

English

StrongEngineer_@hotschmoe·1h

btw im moving to linux kernel 7.0+ to test some better GPU P2P pieces that are missing from earlier kernels to increase Tp=2+ performance

English

StrongEngineer_@hotschmoe·1h

Kernel: 6.18.33 (im on Unraid) GuC firmware: xe/bmg_guc_70.bin version 70.65.0 the rest is in the VLLM container vLLM 0.23.0 vllm-xpu-env:v0230 Qwen3.6-27B int4 AutoRound (w4a16) for qwen3.6 you need a gdn_attention XPU kernel, you can view my github, @xyster git repos, and localmaxxing configs

Jonathan Staples@staples46198

@hotschmoe Do you mind me asking what your stack is? I just tried vLLM and same thing. I wedged after a 55 second sustained load. 1) Kernel version (uname -r) 2) GuC firmware version 3) oneAPI/Level-Zero version I've got to be missing something here.

English

310

StrongEngineer_ รีทวีตแล้ว

george hotz archive@geohotarchive·10h

The doom justifies the valuation geohot.github.io//blog/jekyll/u…

English

233

26K

StrongEngineer_@hotschmoe·5h

@mr_r0b0t one intel b70 is $950 for 32GB VRAM

English

133

mr-r0b0t@mr_r0b0t·12h

FWIW Two of these get you a very capable 32GB of VRAM 👀

English

6.6K

StrongEngineer_@hotschmoe·5h

@sudoingX opinions?? on x, the everything app?

GIF

English

Sudo su@sudoingX·20h

hey, i'm sudo. and i have opinions.

English

2.3K

StrongEngineer_@hotschmoe·5h

@staples46198 im running vLLM + Pi have got 100k+ context with no problems, and 3x streaming running for 10min+ each no problems for B70s, vLLM was really the only backend that interested me, I havent look at anything else

English

Jonathan Staples@staples46198·6h

Multiple harnesses - Cursor, Cline, even Llama-swap. I've got a bug out to Llamma.cpp and the XE Kernel. Seems to be right around ~4,000 tokens or ~70 seconds of sustained load that it wedges. Seems to wedge faster on MOE than Dense models - so not sure if it's more about the model or the tps or working time. It will wedge on both coding work and simple text generation. gitlab.freedesktop.org/drm/xe/kernel/… github.com/ggml-org/llama…

English

StrongEngineer_@hotschmoe·1d

Intel arc B70, pulls ~200w, one easy 8-pin connection. short, 2-slot, blower style, $950 unoptimized (software maturity miles away from cuda still, so there head room) quick setup with in vllm for 27B W4A16 pp 1521.7 t/s · TTFT 92.3 ms · decode 31.36 t/s that. is. usable.

English

2.6K

StrongEngineer_@hotschmoe·6h

@steeve ooo this is intersting

English

Steeve Morin@steeve·6h

okay hear me out: CPU -> PCI card spoofing VFs -> 400GB network link -> PCI card -> PCIe Gen 5 switch -> GPUs running inference, GPUs have fast TP due to the PCIe switch, and only CPU -> GPU goes through the network using SR-IOV, the emulation could be transparent to the host

Steeve Morin@steeve

software defined PCIe bus?

English

440

StrongEngineer_@hotschmoe·6h

@steeve

GIF

QME

Steeve Morin@steeve·15h

how the b70 intelmaxxing is going

Steeve Morin@steeve

brb intelmaxxing b70

English

3.7K

StrongEngineer_ รีทวีตแล้ว

Steeve Morin@steeve·10 Haz

brb intelmaxxing b70

Indonesia

811

49.6K

StrongEngineer_@hotschmoe·6h

@steeve oooo yessss. soon

English

StrongEngineer_ รีทวีตแล้ว

Steeve Morin@steeve·7h

@hotschmoe right on

English

148

StrongEngineer_@hotschmoe·1d

thats what we like to see boys

English

1.1K

StrongEngineer_@hotschmoe·7h

@Xaraphim @de3dsoul ux too good

English

Phoenix𝕏@Xaraphim·11h

@de3dsoul BROOOOO the xbox blades man

English

787

定@de3dsoul·1d

Back when weekends looked like this

English

488

7.5K

192.7K

StrongEngineer_@hotschmoe·7h

@staples46198 What harness? Qwen 3.6-27b in Pi has had no issues for me so far

English

Jonathan Staples@staples46198·7h

@hotschmoe I can't seem to get mine to run stable. It's fine for short burst work. Anything over ~3 minutes and it wedges. The MOE models seem to wedge faster than the dense models too.

English

StrongEngineer_@hotschmoe·7h

I just hiked Superstition Ridgeline, ends coming down flatiron right there

English

StrongEngineer_@hotschmoe·7h

American Nightmare 🇺🇸@thewakeninq

The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸

ZXX

StrongEngineer_@hotschmoe·7h

@mattforney I just hiked Superstition Ridgeline, ends coming down flatiron right there

English

179

Matt Forney@mattforney·8h

Imagine hating America your entire life because the BBC told you to only to discover amazing food, beautiful sights, and friendly people when you actually come here. This is the greatest cultural turnaround since Americans started loving the Japanese in the 80s.

American Nightmare 🇺🇸@thewakeninq

The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸

English

742

17.8K

StrongEngineer_@hotschmoe·7h

@thewakeninq I just hiked Superstition Ridgeline, ends coming down flatiron right there

English

450

American Nightmare 🇺🇸@thewakeninq·22h

The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸

English

394

3.3K

25.6K

441.5K

StrongEngineer_@hotschmoe·7h

@ivanfioravanti @pcuenq Compared to a server rack of day they're small! Hah

English

Ivan Fioravanti ᯅ@ivanfioravanti·7h

@hotschmoe @pcuenq Small mmm 🤣

English

331

Ivan Fioravanti ᯅ@ivanfioravanti·8h

GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀 Decode speed: 17.9 tokens/sec 🔥 Memory used: ~ 760GB 👀 Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!