Mia
296 posts

Mia
@MiaAI_lab
Local AI, LLMs, tech thinker & builder
เข้าร่วม Temmuz 2022
190 กำลังติดตาม188 ผู้ติดตาม

@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.
English

@WIRED Why are you posting 4 day old articles that are no longer accurate lol
English

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…
English

@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?
English

@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl!
Bet you're already thinking about 4 and a switch tho 😛🤩
Imma try to surprise those urges for a bit
English

@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(
English

@MiaAI_lab @0xSero Btw, wouldn't 3x sparks require an expensive switch?
English

@advented_ Yes, but it works remarkably well. I tried with 4 sessions, and max tok/s was around 80, which is insane considering the limited bandwidth of DGX Spark.
English

@MiaAI_lab They way I understand is Sequences mean concurrent requests.
So your recipe can serve 6 requests at 1M context concurrently.
Under full/heavy load t/s can spike and drop
bc the longest 1M session has to finish generating for other concurrent requests to continue
English

@usr_bin_roygbiv @tmaiaroto Not to mention the amount of power and heat that would come out of such a config.
English

@usr_bin_roygbiv @tmaiaroto Do tell how many 5090 or 3090 I would need to run DeepSeek-v4-Flash with 1M context.
Meanwhile, you can do that with 2x DGX Sparks with 45 tok/s.
x.com/MiaAI_lab/stat…
Mia@MiaAI_lab
Run DeepSeek v4 Flash locally on your 2x DGX Sparks easily, with 1M context github.com/MiaAI-Lab/Deep…
English

Do NOT buy a DGX Spark if prefill speeds and running local AI is not important to you.
Roy@usr_bin_roygbiv
show me your actual tokens per second on a video or a screenshot not this fucking slop DO NOT buy a dgx or mac for llm hosting
English

@BlockedPaths I love it and with 1M context, locally, it's a no brainer IMO.
English

DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark.
Full results in: github.com/MiaAI-Lab/Deep…

English

@DevRico003 @TheGoldenAnvil I'm doing coding with it almost all day long. It's awesome.
English

@MiaAI_lab @TheGoldenAnvil can you do some real coding work with DS4F on 2x sparks? I'm considering to buy a second spark but I'm not sure yet.
English

Awaiting update, seed is growing the the interim.
Mia@MiaAI_lab
Running agentic coding benchmarks on DeepSeek-v4-Flash and Step-3.7-Flash. Will post results soon.
English

@broadfield_dev It's awesome, it's my go-to right now. And 1M context it can do long /goals with Codex app.
English

@MiaAI_lab deepseek v4 flash is great, I use it a lot in my DIY agent harness and there is nothing it can't handle given enough time. It's also not painful to give it all the time it needs to complete tasks.
English

@MiaAI_lab How does the Step model fare with increasing context depth?
I've tested DeepSeek and it's excellent. It takes context 784K to drop the TG to <10 tok/s (on M2 Max 96gb)
#context-depth-benchmarks-via-llama-benchy" target="_blank" rel="nofollow noopener">github.com/ljubomirj/ds4/…
English











