NakeZast

707 posts

NakeZast

@NakeZast

Machine Learning Engineer who's passionate about Hardware, Gaming & F1

Earth Katılım Ağustos 2016

177 Takip Edilen24 Takipçiler

NakeZast@NakeZast·19h

@Tono_Ken3 @stevibe Yikes, that's still rough... IK it's unrealistic to expect a doubling in TPS, but I was hoping it'd be closer to a 40% increase 😅

English

都乃健🇯🇵文明航海士©｜とのけん3@Tono_Ken3·19h

@NakeZast @stevibe TP=2, 22tps TP=4, 36tps May be. I will try that.

English

stevibe@stevibe·1d

How slow does a 128B DENSE model run locally? Qwen3 27B and Gemma 31B are the popular dense models everyone tests. But what happens when you 4x the params? Mistral Medium 3.5 128B, side-by-side on 4x4090 vs 4x5090 vs RTX PRO 6000 vs DGX Spark: 🔴4x4090: 12.06 tok/s decode, 680ms TTFT 🟢4x5090: 19.57 tok/s decode, 572ms TTFT 🟡PRO 6000: 18.12 tok/s decode, 538ms TTFT 🟣DGX Spark: 2.58 tok/s decode, 2243ms TTFT

English

159

35.3K

NakeZast@NakeZast·20h

Hey @gospaceport! Could you perhaps make a Proxmox9 guide on setting up a Linux VM with necessary Passthroughs for AI Workloads other than just LXCs? This would entail independent drivers, kernels, etc. A bit extra overhead, but also lets you do whatever you want.

English

NakeZast@NakeZast·20h

@Higgs_GG @assassinscreed Oh those physical goodies that come along with it are sweet!

English

Higgs@Higgs_GG·21h

Decided I am going be giving away 2 of these physical Assassins Creed Black Flag Launch Editions. Digital is hard right now as there are no codes from what i can see and i do not want to give a voucher code. Spreading good vibes amongst the @assassinscreed community is what it is all about.

English

109

4.1K

NakeZast@NakeZast·1d

@Higgs_GG Yeah, the one time I really wish I was in the West 😅 Or heck the far east in the land down under - that'd work too

English

Higgs@Higgs_GG·1d

@NakeZast Fair enough, can imagine important costs are super expensive

English

Higgs@Higgs_GG·2d

Paid for a Assassins Creed Black Flag Resynced deluxe edition for a good mate... spreading love and good vibes is what it is all about. I'm looking forward to giving someone else the chance in my giveaway to experience it as well. ✌🏻

English

2.3K

NakeZast@NakeZast·1d

@loktar00 Ollama-ception

GIF

Français

Loktar 🇺🇸@loktar00·1d

lol random thought does ollama use ollama for their cloud service?🤔 There's no way is there?

English

373

NakeZast@NakeZast·1d

@loktar00 @unbug Ah got it! And finally: have you tried running on Single 5090? Since the model should be able to fit comfortably on a single GPU & still provide 64k+ context window on FP8 KV Cache.

English

Loktar 🇺🇸@loktar00·1d

@NakeZast @unbug No, nvfp4 does well just depends where/who it comes form it seems, in this case I had better luck with the 4bit awq

English

Loktar 🇺🇸@loktar00·2d

I wish 3.6 35B was just a little better.. the speeds I'm getting are insane.

English

174

13.4K

NakeZast@NakeZast·1d

@loktar00 @unbug I can't even type out my full config to share it with you to show what I was trying to attempt (Twitter word limit is shite... without paying for the subscription 😅)

English

Loktar 🇺🇸@loktar00·2d

Haven't done anything special with PCIE, running an AI TOP b850 with the 2 5090s and both run at x8 PCIE 5 NVFP4 work well! I'm using both, here's my "slow" version (nvfp4) vs my "fast" one. python3 -m vllm.entrypoints.openai.api_server \ --download-dir /mnt/share/llm \ --model mmangkad/Qwen3.6-35B-A3B-NVFP4 \ --served-model-name=qwen3.6-35B-A3b \ --gpu-memory-utilization 0.92 \ --tool-call-parser qwen3_coder \ --enable-auto-tool-choice \ --tensor-parallel-size 2 \ --reasoning-parser qwen3 \ --block-size 32 \ --max-num-batched-tokens 4096 \ --language-model-only \ --enable-prefix-caching \ --max-num-seqs 4 \ --enable-expert-parallel \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --override-generation-config '{"temperature": 1.0, "top-p": 0.95, "top-k": 20, "min-p": 0.0, "presence-penalty": 1.5, "repetition_penalty": 1.0}' // Fast python3 -m vllm.entrypoints.openai.api_server \ --download-dir /mnt/share/llm \ --model cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit \ --served-model-name=qwen3.6-35B-A3b \ --gpu-memory-utilization 0.92 \ --tool-call-parser qwen3_coder \ --enable-auto-tool-choice \ --tensor-parallel-size 2 \ --reasoning-parser qwen3 \ --block-size 32 \ --max-num-batched-tokens 4096 \ --language-model-only \ --max-num-seqs 4 \ --enable-expert-parallel \ --default-chat-template-kwargs '{"preserve_thinking": true}' \ --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":4}' \ --override-generation-config '{"temperature": 1.0, "top-p": 0.95, "top-k": 20, "min-p": 0.0, "presence-penalty": 1.5, "repetition_penalty": 1.0}'

English

NakeZast@NakeZast·2d

@Higgs_GG Importing*

English

NakeZast@NakeZast·2d

@Higgs_GG Only issue being, can't find it in my region, if there's any available it goes 50% more than the base price. Imposing from UK costs a decent chunk but then the Imports & Custom situation is unknown... *sigh*

English

NakeZast@NakeZast·2d

@loktar00 @unbug 3) Overall any vLLM tips for max. TPS & Concurrency would be appreciated :D

English

NakeZast@NakeZast·2d

@loktar00 @unbug 2) Any success with NVFP4 Models? Especially the NVFP4 versions of Qwen3.6 models? I've tried the Quant from RedHatAI, it's working well for others, but on my rig I'm running into infinite reasoning tokens despite setting reasoning parser and everything in vLLM config 😅

English

NakeZast@NakeZast·2d

@loktar00 @unbug Hey, I have the same rig! If you're able to, would you be down to answer a few technical questions regarding your configuration? As I seem to be running into some difficulties myself running the same rig.

English

Loktar 🇺🇸@loktar00·2d

@unbug 2x5090s

557

NakeZast@NakeZast·2d

@0xSero Best model to run on a Single & Dual 5090 scenario that provides high TPS (generation) with a good balance in terms of Intelligence? (Anything over a gpt-oss 20b would be an upgrade)

English

NakeZast@NakeZast·2d

@GuntasSinghIN @carygolomb 10/10 setup mate, it's absolutely beautiful.

English

Guntas Singh@GuntasSinghIN·2d

@NakeZast @carygolomb Super easy to do

English

Cary Golomb@carygolomb·4d

No way!!! Finally! I'm doing this right now

XDA@xdadevelopers

Someone built the PlayStation controller PC dongle that Sony refuses to make bit.ly/4n1czKp

English

1.2K

87.5K

NakeZast@NakeZast·2d

@Skipp_64 @omer97332361331 @TimUniqueName Are we going 1 arc per season? Meaning this would be closer to Season 7 of the anime? Do forgive my ignorance, not familiar with the LN 😅

English

188

Skipp64@Skipp_64·3d

@omer97332361331 @TimUniqueName Arc 7 exists

Català

2.4K

omer@omer97332361331·3d

You're watching an anime adaptation of one of the longest novels ever written in fiction, and you're trying to skip seasons... Just drop it, buddy.

thebashanator@thebashanator7

Can somebody summarize S3 of Rezero for me? I really don’t wanna watch all of that. I’m just watching S4 cuz it looks cool. Just give me a TLDR

English

329

254.5K

NakeZast@NakeZast·2d

@Higgs_GG Discovered your videos recently with our recent small banter of chats regarding Resynced, but great content! Will hopefully keep watching :D

English

Higgs@Higgs_GG·3d

Thanks for all the support on the Assassins Creed black flag resynced video. Positive vibes and i appreciate it. I might get some stuff incorrect but I'm learning more about the franchise everytime. I just want to bring my own angle and opinions from a new players perspective. ✌🏻

English

1.3K

Keşfet

@Tono_Ken3 @stevibe @gospaceport @Higgs_GG @assassinscreed @loktar00 @unbug @elonmusk