Angehefteter Tweet
StrongEngineer_
7.2K posts

StrongEngineer_
@hotschmoe
Christian • Father of 4 • Structural Engineer • e/acc • BTB Jungle Lurker • too many labels
Desert Southwest, USA Beigetreten Ekim 2021
556 Folgt863 Follower
StrongEngineer_ retweetet

The doom justifies the valuation geohot.github.io//blog/jekyll/u…
English

@staples46198 im running vLLM + Pi
have got 100k+ context with no problems, and 3x streaming running for 10min+ each no problems
for B70s, vLLM was really the only backend that interested me, I havent look at anything else
English

Multiple harnesses - Cursor, Cline, even Llama-swap. I've got a bug out to Llamma.cpp and the XE Kernel. Seems to be right around ~4,000 tokens or ~70 seconds of sustained load that it wedges.
Seems to wedge faster on MOE than Dense models - so not sure if it's more about the model or the tps or working time.
It will wedge on both coding work and simple text generation.
gitlab.freedesktop.org/drm/xe/kernel/…
github.com/ggml-org/llama…
English

okay hear me out:
CPU -> PCI card spoofing VFs -> 400GB network link -> PCI card -> PCIe Gen 5 switch -> GPUs
running inference, GPUs have fast TP due to the PCIe switch, and only CPU -> GPU goes through the network
using SR-IOV, the emulation could be transparent to the host
Steeve Morin@steeve
software defined PCIe bus?
English

StrongEngineer_ retweetet
StrongEngineer_ retweetet

@staples46198 What harness? Qwen 3.6-27b in Pi has had no issues for me so far
English

@hotschmoe I can't seem to get mine to run stable. It's fine for short burst work. Anything over ~3 minutes and it wedges. The MOE models seem to wedge faster than the dense models too.
English

@mattforney I just hiked Superstition Ridgeline, ends coming down flatiron right there
English

Imagine hating America your entire life because the BBC told you to only to discover amazing food, beautiful sights, and friendly people when you actually come here. This is the greatest cultural turnaround since Americans started loving the Japanese in the 80s.
American Nightmare 🇺🇸@thewakeninq
The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸
English


@ivanfioravanti @pcuenq Compared to a server rack of day they're small! Hah
English

GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀
Decode speed: 17.9 tokens/sec 🔥
Memory used: ~ 760GB 👀
Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!

English

@sudoingX Even a new $950 Intel b70 gets usable, concurrent qwen 3.6-27b streams
I would have killed to have this 4 years ago, but for some reason people think it means nothing now ??
English















