StrongEngineer_

7.2K posts

StrongEngineer_ banner
StrongEngineer_

StrongEngineer_

@hotschmoe

Christian • Father of 4 • Structural Engineer • e/acc • BTB Jungle Lurker • too many labels

Desert Southwest, USA Beigetreten Ekim 2021
556 Folgt863 Follower
Angehefteter Tweet
StrongEngineer_
StrongEngineer_@hotschmoe·
MAKE ENGINEERS GREAT AGAIN "A good engineer gets stale very fast if he doesn't keep his hands dirty." - Wernher von Braun
English
2
0
27
2.5K
mr-r0b0t
mr-r0b0t@mr_r0b0t·
FWIW Two of these get you a very capable 32GB of VRAM 👀
mr-r0b0t tweet media
English
16
0
46
5.3K
Sudo su
Sudo su@sudoingX·
hey, i'm sudo. and i have opinions.
English
10
0
41
2.1K
StrongEngineer_
StrongEngineer_@hotschmoe·
@staples46198 im running vLLM + Pi have got 100k+ context with no problems, and 3x streaming running for 10min+ each no problems for B70s, vLLM was really the only backend that interested me, I havent look at anything else
English
1
0
0
15
Jonathan Staples
Jonathan Staples@staples46198·
Multiple harnesses - Cursor, Cline, even Llama-swap. I've got a bug out to Llamma.cpp and the XE Kernel. Seems to be right around ~4,000 tokens or ~70 seconds of sustained load that it wedges. Seems to wedge faster on MOE than Dense models - so not sure if it's more about the model or the tps or working time. It will wedge on both coding work and simple text generation. gitlab.freedesktop.org/drm/xe/kernel/… github.com/ggml-org/llama…
English
1
0
1
20
StrongEngineer_
StrongEngineer_@hotschmoe·
Intel arc B70, pulls ~200w, one easy 8-pin connection. short, 2-slot, blower style, $950 unoptimized (software maturity miles away from cuda still, so there head room) quick setup with in vllm for 27B W4A16 pp 1521.7 t/s · TTFT 92.3 ms · decode 31.36 t/s that. is. usable.
English
4
1
21
2.5K
Steeve Morin
Steeve Morin@steeve·
okay hear me out: CPU -> PCI card spoofing VFs -> 400GB network link -> PCI card -> PCIe Gen 5 switch -> GPUs running inference, GPUs have fast TP due to the PCIe switch, and only CPU -> GPU goes through the network using SR-IOV, the emulation could be transparent to the host
Steeve Morin@steeve

software defined PCIe bus?

English
2
0
2
320
StrongEngineer_ retweetet
Steeve Morin
Steeve Morin@steeve·
brb intelmaxxing b70
Steeve Morin tweet media
Indonesia
52
32
811
49.2K
StrongEngineer_
StrongEngineer_@hotschmoe·
thats what we like to see boys
StrongEngineer_ tweet media
English
1
0
6
1K
定
@de3dsoul·
Back when weekends looked like this
定 tweet media
English
55
478
7.3K
188.4K
Jonathan Staples
Jonathan Staples@staples46198·
@hotschmoe I can't seem to get mine to run stable. It's fine for short burst work. Anything over ~3 minutes and it wedges. The MOE models seem to wedge faster than the dense models too.
English
1
0
1
11
StrongEngineer_
StrongEngineer_@hotschmoe·
I just hiked Superstition Ridgeline, ends coming down flatiron right there
English
0
0
0
19
StrongEngineer_
StrongEngineer_@hotschmoe·
@mattforney I just hiked Superstition Ridgeline, ends coming down flatiron right there
English
0
0
1
151
Matt Forney
Matt Forney@mattforney·
Imagine hating America your entire life because the BBC told you to only to discover amazing food, beautiful sights, and friendly people when you actually come here. This is the greatest cultural turnaround since Americans started loving the Japanese in the 80s.
American Nightmare 🇺🇸@thewakeninq

The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸

English
12
43
665
16.1K
StrongEngineer_
StrongEngineer_@hotschmoe·
@thewakeninq I just hiked Superstition Ridgeline, ends coming down flatiron right there
StrongEngineer_ tweet media
English
0
0
3
301
American Nightmare 🇺🇸
The media sold the world fear. Tourists brought back the receipts. They came expecting chaos and left praising Trump’s America. Turns out reality hits a lot harder than CNN talking points. 🇺🇸
English
367
3.1K
23.7K
402.3K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
GLM-5.2 8bit running on two M3 Ultra 512GB with MLX distributed? Here it is! 🚀 Decode speed: 17.9 tokens/sec 🔥 Memory used: ~ 760GB 👀 Again keep in mind it's a preliminary PR by super @pcuenq still a WIP!
Ivan Fioravanti ᯅ tweet media
English
15
10
143
13.2K
Serf
Serf@TheRoyalSerf·
Explain your politics with a gif image or video
English
1.2K
20
726
421.3K
StrongEngineer_
StrongEngineer_@hotschmoe·
@sudoingX Even a new $950 Intel b70 gets usable, concurrent qwen 3.6-27b streams I would have killed to have this 4 years ago, but for some reason people think it means nothing now ??
English
0
0
0
56
Sudo su
Sudo su@sudoingX·
a used 3090, 900 to 1200 bucks, runs qwen 3.6 27b dense and does real agentic work. on your desk. offline. yours. i will never get over this, we're living through the cheapest superpower in human history and most people scroll right past it.
English
28
10
216
13K