Mia

296 posts

Mia

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

เข้าร่วม Temmuz 2022

190 กำลังติดตาม188 ผู้ติดตาม

Mia@MiaAI_lab·45m

I find default DS4-Flash temperature is giving it "too" much creativity for coding. It sometimes does thing I don't ask. Going to run it with temp 0.6 and top_p 0.95 for a while and compare.

English

Mia@MiaAI_lab·49m

@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.

English

cryptowish.eth@anuntrapid_auto·1h

@MiaAI_lab @0xSero 4 x is the way. TP 8 vllm/sglang

English

0xSero@0xSero·15h

MiniMax-M3-NVFP4 running on 4x RTX PRO 6000 Repo coming soon.

English

140

6.7K

Mia@MiaAI_lab·1h

@theo @WIRED They live in the past.

English

Theo - t3.gg@theo·3h

@WIRED Why are you posting 4 day old articles that are no longer accurate lol

English

1.2K

17.4K

WIRED@WIRED·11h

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…

English

238

65K

Mia@MiaAI_lab·2h

@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?

English

mr-r0b0t@mr_r0b0t·8h

@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl! Bet you're already thinking about 4 and a switch tho 😛🤩 Imma try to surprise those urges for a bit

English

mr-r0b0t@mr_r0b0t·10h

So I did a thing 😁

English

4.5K

Mia@MiaAI_lab·2h

@joaosump @0xSero Yes, TP works in pairs, 2,4, etc. 3 units would be good more concurrencies and kv-cache.

English

Vieirowski@joaosump·2h

@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(

English

Mia@MiaAI_lab·2h

@joaosump @0xSero No, you go up to 3x DGX Sparks connected directly without the need for a switch. There's even a workaround to connect 4x DGX Sparks without a switch.

English

Vieirowski@joaosump·3h

@MiaAI_lab @0xSero Btw, wouldn't 3x sparks require an expensive switch?

English

Mia@MiaAI_lab·3h

@0xSero Any solution to connect to Codex app from my iPhone when using codex-shim?

English

Mia@MiaAI_lab·3h

@advented_ Yes, but it works remarkably well. I tried with 4 sessions, and max tok/s was around 80, which is insane considering the limited bandwidth of DGX Spark.

English

Advented@advented_·4h

@MiaAI_lab They way I understand is Sequences mean concurrent requests. So your recipe can serve 6 requests at 1M context concurrently. Under full/heavy load t/s can spike and drop bc the longest 1M session has to finish generating for other concurrent requests to continue

English

Mia@MiaAI_lab·2d

53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳 Elapsed time: 56.694 s Prompt tokens: 29 Generated tokens: 3000 Total tokens: 3029 Generation tok/sec: 52.92 End-to-end tok/sec: 53.43

English

4.6K

Mia@MiaAI_lab·4h

@AlicanKiraz0 Is this with MTP?

English

Alican Kiraz@AlicanKiraz0·6h

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 and 4x DGX Spark 430 GB Vram 🔥 25 tok/sec 🚀

Polski

2.9K

Mia@MiaAI_lab·5h

@TheAhmadOsman That's a lot of GPUs. Looks messy as hell.

English

Ahmad@TheAhmadOsman·8 Eyl

My house has 33 GPUs. > 21x RTX 3090s > 4x RTX 4090s > 4x RTX 5090s > 4x Tenstorrent Blackhole p150a Before AGI arrives: Acquire GPUs. Go into debt if you must. But whatever you do, secure the GPUs.

English

633

239

4.8K

2.1M

Mia@MiaAI_lab·5h

@usr_bin_roygbiv @tmaiaroto Not to mention the amount of power and heat that would come out of such a config.

English

Mia@MiaAI_lab·5h

@usr_bin_roygbiv @tmaiaroto Do tell how many 5090 or 3090 I would need to run DeepSeek-v4-Flash with 1M context. Meanwhile, you can do that with 2x DGX Sparks with 45 tok/s. x.com/MiaAI_lab/stat…

Mia@MiaAI_lab

Run DeepSeek v4 Flash locally on your 2x DGX Sparks easily, with 1M context github.com/MiaAI-Lab/Deep…

English

Tom Maiaroto@tmaiaroto·13h

Do NOT buy a DGX Spark if prefill speeds and running local AI is not important to you.

Roy@usr_bin_roygbiv

show me your actual tokens per second on a video or a screenshot not this fucking slop DO NOT buy a dgx or mac for llm hosting

English

1.5K

Mia@MiaAI_lab·13h

@BlockedPaths I love it and with 1M context, locally, it's a no brainer IMO.

English

BlockedPath@BlockedPaths·13h

@MiaAI_lab Deepseek is so slept on.

English

Mia@MiaAI_lab·15h

DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark. Full results in: github.com/MiaAI-Lab/Deep…

English

2.3K

Mia@MiaAI_lab·13h

@DevRico003 @TheGoldenAnvil I'm doing coding with it almost all day long. It's awesome.

English

DevRico003@DevRico003·13h

@MiaAI_lab @TheGoldenAnvil can you do some real coding work with DS4F on 2x sparks? I'm considering to buy a second spark but I'm not sure yet.

English

TheGoldenAnchor@TheGoldenAnvil·14h

Awaiting update, seed is growing the the interim.

Mia@MiaAI_lab

Running agentic coding benchmarks on DeepSeek-v4-Flash and Step-3.7-Flash. Will post results soon.

English

Mia@MiaAI_lab·14h

@broadfield_dev It's awesome, it's my go-to right now. And 1M context it can do long /goals with Codex app.

English

broadfield-dev@broadfield_dev·14h

@MiaAI_lab deepseek v4 flash is great, I use it a lot in my DIY agent harness and there is nothing it can't handle given enough time. It's also not painful to give it all the time it needs to complete tasks.

English

Mia@MiaAI_lab·14h

@ljupc0 Can't really compare bcs Step's max context is 256k vs 1M for the DS4-Flash

English

Ljubomir Josifovski@ljupc0·14h

@MiaAI_lab How does the Step model fare with increasing context depth? I've tested DeepSeek and it's excellent. It takes context 784K to drop the TG to <10 tok/s (on M2 Max 96gb) #context-depth-benchmarks-via-llama-benchy" target="_blank" rel="nofollow noopener">github.com/ljubomirj/ds4/…

English

Mia@MiaAI_lab·16h

Local agentic 'Tool-Call Benchmark' between DeepSeek-v4-Flash to Step-3.7-Flash. Same host, same 69 scenarios, two models. Results: DeepSeek-v4-Flash: 90/100 quality, 59 passed, 6 partial, 4 failed Step-3.7-Flash: 87/100 quality, 55 passed, 10 partial, 4 failed 👇

English

1.3K