NakeZast

707 posts

NakeZast

NakeZast

@NakeZast

Machine Learning Engineer who's passionate about Hardware, Gaming & F1

Earth Katılım Ağustos 2016
177 Takip Edilen24 Takipçiler
NakeZast
NakeZast@NakeZast·
@Tono_Ken3 @stevibe Yikes, that's still rough... IK it's unrealistic to expect a doubling in TPS, but I was hoping it'd be closer to a 40% increase 😅
English
0
0
1
24
stevibe
stevibe@stevibe·
How slow does a 128B DENSE model run locally? Qwen3 27B and Gemma 31B are the popular dense models everyone tests. But what happens when you 4x the params? Mistral Medium 3.5 128B, side-by-side on 4x4090 vs 4x5090 vs RTX PRO 6000 vs DGX Spark: 🔴4x4090: 12.06 tok/s decode, 680ms TTFT 🟢4x5090: 19.57 tok/s decode, 572ms TTFT 🟡PRO 6000: 18.12 tok/s decode, 538ms TTFT 🟣DGX Spark: 2.58 tok/s decode, 2243ms TTFT
English
26
8
159
35.3K
NakeZast
NakeZast@NakeZast·
Hey @gospaceport! Could you perhaps make a Proxmox9 guide on setting up a Linux VM with necessary Passthroughs for AI Workloads other than just LXCs? This would entail independent drivers, kernels, etc. A bit extra overhead, but also lets you do whatever you want.
English
0
0
0
24
Higgs
Higgs@Higgs_GG·
Decided I am going be giving away 2 of these physical Assassins Creed Black Flag Launch Editions. Digital is hard right now as there are no codes from what i can see and i do not want to give a voucher code. Spreading good vibes amongst the @assassinscreed community is what it is all about.
Higgs tweet media
English
23
32
109
4.1K
NakeZast
NakeZast@NakeZast·
@Higgs_GG Yeah, the one time I really wish I was in the West 😅 Or heck the far east in the land down under - that'd work too
English
0
0
1
21
Higgs
Higgs@Higgs_GG·
@NakeZast Fair enough, can imagine important costs are super expensive
English
1
0
1
19
Higgs
Higgs@Higgs_GG·
Paid for a Assassins Creed Black Flag Resynced deluxe edition for a good mate... spreading love and good vibes is what it is all about. I'm looking forward to giving someone else the chance in my giveaway to experience it as well. ✌🏻
English
10
3
82
2.3K
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
lol random thought does ollama use ollama for their cloud service?🤔 There's no way is there?
English
4
0
2
373
NakeZast
NakeZast@NakeZast·
@loktar00 @unbug Ah got it! And finally: have you tried running on Single 5090? Since the model should be able to fit comfortably on a single GPU & still provide 64k+ context window on FP8 KV Cache.
English
0
0
1
30
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
@NakeZast @unbug No, nvfp4 does well just depends where/who it comes form it seems, in this case I had better luck with the 4bit awq
English
1
0
0
36
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
I wish 3.6 35B was just a little better.. the speeds I'm getting are insane.
Loktar 🇺🇸 tweet media
English
26
5
174
13.4K
NakeZast
NakeZast@NakeZast·
@loktar00 @unbug I can't even type out my full config to share it with you to show what I was trying to attempt (Twitter word limit is shite... without paying for the subscription 😅)
English
0
0
0
12
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
Haven't done anything special with PCIE, running an AI TOP b850 with the 2 5090s and both run at x8 PCIE 5 NVFP4 work well! I'm using both, here's my "slow" version (nvfp4) vs my "fast" one. python3 -m vllm.entrypoints.openai.api_server \ --download-dir /mnt/share/llm \ --model mmangkad/Qwen3.6-35B-A3B-NVFP4 \ --served-model-name=qwen3.6-35B-A3b \ --gpu-memory-utilization 0.92 \ --tool-call-parser qwen3_coder \ --enable-auto-tool-choice \ --tensor-parallel-size 2 \ --reasoning-parser qwen3 \ --block-size 32 \ --max-num-batched-tokens 4096 \ --language-model-only \ --enable-prefix-caching \ --max-num-seqs 4 \ --enable-expert-parallel \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --override-generation-config '{"temperature": 1.0, "top-p": 0.95, "top-k": 20, "min-p": 0.0, "presence-penalty": 1.5, "repetition_penalty": 1.0}' // Fast python3 -m vllm.entrypoints.openai.api_server \ --download-dir /mnt/share/llm \ --model cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit \ --served-model-name=qwen3.6-35B-A3b \ --gpu-memory-utilization 0.92 \ --tool-call-parser qwen3_coder \ --enable-auto-tool-choice \ --tensor-parallel-size 2 \ --reasoning-parser qwen3 \ --block-size 32 \ --max-num-batched-tokens 4096 \ --language-model-only \ --max-num-seqs 4 \ --enable-expert-parallel \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":4}' \ --override-generation-config '{"temperature": 1.0, "top-p": 0.95, "top-k": 20, "min-p": 0.0, "presence-penalty": 1.5, "repetition_penalty": 1.0}'
English
2
0
0
57
NakeZast
NakeZast@NakeZast·
@Higgs_GG Only issue being, can't find it in my region, if there's any available it goes 50% more than the base price. Imposing from UK costs a decent chunk but then the Imports & Custom situation is unknown... *sigh*
English
2
0
1
20
NakeZast
NakeZast@NakeZast·
@loktar00 @unbug 3) Overall any vLLM tips for max. TPS & Concurrency would be appreciated :D
English
1
0
1
49
NakeZast
NakeZast@NakeZast·
@loktar00 @unbug 2) Any success with NVFP4 Models? Especially the NVFP4 versions of Qwen3.6 models? I've tried the Quant from RedHatAI, it's working well for others, but on my rig I'm running into infinite reasoning tokens despite setting reasoning parser and everything in vLLM config 😅
English
1
0
1
53
NakeZast
NakeZast@NakeZast·
@loktar00 @unbug Hey, I have the same rig! If you're able to, would you be down to answer a few technical questions regarding your configuration? As I seem to be running into some difficulties myself running the same rig.
English
1
0
1
75
NakeZast
NakeZast@NakeZast·
@0xSero Best model to run on a Single & Dual 5090 scenario that provides high TPS (generation) with a good balance in terms of Intelligence? (Anything over a gpt-oss 20b would be an upgrade)
English
0
0
0
13
NakeZast
NakeZast@NakeZast·
@Higgs_GG Discovered your videos recently with our recent small banter of chats regarding Resynced, but great content! Will hopefully keep watching :D
English
1
0
1
22
Higgs
Higgs@Higgs_GG·
Thanks for all the support on the Assassins Creed black flag resynced video. Positive vibes and i appreciate it. I might get some stuff incorrect but I'm learning more about the franchise everytime. I just want to bring my own angle and opinions from a new players perspective. ✌🏻
Higgs tweet media
English
7
0
30
1.3K