Mia

296 posts

Mia banner
Mia

Mia

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

เข้าร่วม Temmuz 2022
190 กำลังติดตาม188 ผู้ติดตาม
Mia
Mia@MiaAI_lab·
I find default DS4-Flash temperature is giving it "too" much creativity for coding. It sometimes does thing I don't ask. Going to run it with temp 0.6 and top_p 0.95 for a while and compare.
English
0
0
1
15
Mia
Mia@MiaAI_lab·
@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.
English
0
0
0
3
0xSero
0xSero@0xSero·
MiniMax-M3-NVFP4 running on 4x RTX PRO 6000 Repo coming soon.
0xSero tweet media
English
12
5
140
6.7K
Theo - t3.gg
Theo - t3.gg@theo·
@WIRED Why are you posting 4 day old articles that are no longer accurate lol
English
36
5
1.2K
17.4K
WIRED
WIRED@WIRED·
Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…
English
63
25
238
65K
Mia
Mia@MiaAI_lab·
@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?
English
0
0
0
7
mr-r0b0t
mr-r0b0t@mr_r0b0t·
@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl! Bet you're already thinking about 4 and a switch tho 😛🤩 Imma try to surprise those urges for a bit
English
3
0
0
95
mr-r0b0t
mr-r0b0t@mr_r0b0t·
So I did a thing 😁
mr-r0b0t tweet media
English
31
0
98
4.5K
Mia
Mia@MiaAI_lab·
@joaosump @0xSero Yes, TP works in pairs, 2,4, etc. 3 units would be good more concurrencies and kv-cache.
English
0
0
0
8
Vieirowski
Vieirowski@joaosump·
@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(
English
1
0
0
17
Mia
Mia@MiaAI_lab·
@joaosump @0xSero No, you go up to 3x DGX Sparks connected directly without the need for a switch. There's even a workaround to connect 4x DGX Sparks without a switch.
English
1
0
1
16
Mia
Mia@MiaAI_lab·
@0xSero Any solution to connect to Codex app from my iPhone when using codex-shim?
English
1
0
0
58
Mia
Mia@MiaAI_lab·
@advented_ Yes, but it works remarkably well. I tried with 4 sessions, and max tok/s was around 80, which is insane considering the limited bandwidth of DGX Spark.
English
0
0
1
27
Advented
Advented@advented_·
@MiaAI_lab They way I understand is Sequences mean concurrent requests. So your recipe can serve 6 requests at 1M context concurrently. Under full/heavy load t/s can spike and drop bc the longest 1M session has to finish generating for other concurrent requests to continue
English
1
1
2
25
Mia
Mia@MiaAI_lab·
53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳 Elapsed time: 56.694 s Prompt tokens: 29 Generated tokens: 3000 Total tokens: 3029 Generation tok/sec: 52.92 End-to-end tok/sec: 53.43
Mia tweet media
English
6
0
39
4.6K
Alican Kiraz
Alican Kiraz@AlicanKiraz0·
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 and 4x DGX Spark 430 GB Vram 🔥 25 tok/sec 🚀
Alican Kiraz tweet media
Polski
6
0
18
2.9K
Mia
Mia@MiaAI_lab·
@TheAhmadOsman That's a lot of GPUs. Looks messy as hell.
English
0
0
0
7
Ahmad
Ahmad@TheAhmadOsman·
My house has 33 GPUs. > 21x RTX 3090s > 4x RTX 4090s > 4x RTX 5090s > 4x Tenstorrent Blackhole p150a Before AGI arrives: Acquire GPUs. Go into debt if you must. But whatever you do, secure the GPUs.
Ahmad tweet media
English
633
239
4.8K
2.1M
Mia
Mia@MiaAI_lab·
@BlockedPaths I love it and with 1M context, locally, it's a no brainer IMO.
English
0
0
1
34
Mia
Mia@MiaAI_lab·
DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark. Full results in: github.com/MiaAI-Lab/Deep…
Mia tweet media
English
9
0
31
2.3K
DevRico003
DevRico003@DevRico003·
@MiaAI_lab @TheGoldenAnvil can you do some real coding work with DS4F on 2x sparks? I'm considering to buy a second spark but I'm not sure yet.
English
1
0
1
9
Mia
Mia@MiaAI_lab·
@broadfield_dev It's awesome, it's my go-to right now. And 1M context it can do long /goals with Codex app.
English
1
0
1
50
broadfield-dev
broadfield-dev@broadfield_dev·
@MiaAI_lab deepseek v4 flash is great, I use it a lot in my DIY agent harness and there is nothing it can't handle given enough time. It's also not painful to give it all the time it needs to complete tasks.
English
1
0
0
48
Mia
Mia@MiaAI_lab·
@ljupc0 Can't really compare bcs Step's max context is 256k vs 1M for the DS4-Flash
English
0
0
1
10
Ljubomir Josifovski
@MiaAI_lab How does the Step model fare with increasing context depth? I've tested DeepSeek and it's excellent. It takes context 784K to drop the TG to <10 tok/s (on M2 Max 96gb) #context-depth-benchmarks-via-llama-benchy" target="_blank" rel="nofollow noopener">github.com/ljubomirj/ds4/…
English
1
0
0
17
Mia
Mia@MiaAI_lab·
Local agentic 'Tool-Call Benchmark' between DeepSeek-v4-Flash to Step-3.7-Flash. Same host, same 69 scenarios, two models. Results: DeepSeek-v4-Flash: 90/100 quality, 59 passed, 6 partial, 4 failed Step-3.7-Flash: 87/100 quality, 55 passed, 10 partial, 4 failed 👇
English
4
1
18
1.3K
Mia
Mia@MiaAI_lab·
@0xSero It will take days for each, can't do that as I have no other alternatives to run other stuff
English
0
0
1
114
0xSero
0xSero@0xSero·
@MiaAI_lab Try terminal-bench and deepswe Will take forever tho
English
2
0
16
1.5K
frzlt
frzlt@DenysYaroshenko·
@MiaAI_lab what hardware are you using ?
English
1
0
0
50
Mia
Mia@MiaAI_lab·
Running agentic coding benchmarks on DeepSeek-v4-Flash and Step-3.7-Flash. Will post results soon.
Mia tweet media
English
2
0
25
1.7K
Mia
Mia@MiaAI_lab·
@0xSero Would it fix 3x DGX Sparks with full context?
English
2
0
0
82
0xSero
0xSero@0xSero·
@MiaAI_lab It needs some pruning, or needs to be q3
English
1
0
2
306