Mia

300 posts

Mia

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

Se unió Temmuz 2022

190 Siguiendo191 Seguidores

Mia@MiaAI_lab·13m

@mr_r0b0t @garychanhk825 @Tech2Wild @NVIDIAAI You do know that you will get the 4th, right? Right??

English

mr-r0b0t@mr_r0b0t·14m

2x will cover many many use cases! The third one should help me train larger models, something that could easily be done more quickly by renting a B200 cloud instance. Given I still have much to learn, renting GPUs could/would become expensive very quickly, and any learning (read mistakes) would be quite costly!

English

mr-r0b0t@mr_r0b0t·13h

So I did a thing 😁

English

106

Mia@MiaAI_lab·2h

@RobbiewOnline I agree, I use it everyday. I published it yesterday. github.com/MiaAI-Lab/repo…

English

RobbiewOnline@RobbiewOnline·2h

@MiaAI_lab RepoPrompt looks like a smart way to optimize token usage when working with LLMs, something we dive deep into in the book. The cost of blindly pasting entire files adds up fast (often £1-4 per session), so selective file inclusion is key. Your XML output approach could pair well with local model routing strategies covered in Chapter 2. amazon.com/dp/B0GYDV3FXD If this was helpful, I’d appreciate a repost and a follow. I’ll be sharing more insights from the book, plus what I’ve learned from applying them in the real world, over the coming weeks.

English

Mia@MiaAI_lab·2h

@QuixiAI @deepseek_ai @StepFun_ai If you need vision then you need vision. But if it's not a must I think DS4-Flash is a better choice.

English

Eric Hartford@QuixiAI·15h

@MiaAI_lab @deepseek_ai Deepseek v4 Flash is text-only, 284B @StepFun_ai Step 3.7 Flash is a Text + Vision model, 198B The vision and the smaller size are more appealing. I choose Step 3.7 Flash.

English

Mia@MiaAI_lab·18h

DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark. Full results in: github.com/MiaAI-Lab/Deep…

English

2.5K

Mia@MiaAI_lab·2h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai I wish I had the compute to run MiniMax M3. For now, DeepSeek-v4-Flash is unbeatable for 2x DGX Spark setup.

English

Christian Merrill@M_Chimiste·2h

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai I had a lot of tool call issues with Step 3.7. I think I was using Q8 at the time in Hermes Agent. I ended up reverting to Minimax M2.7 and working on moving to M3 for the multimodal input.

English

Mia@MiaAI_lab·3h

I find default DS4-Flash temperature is giving it "too" much creativity for coding. It sometimes does thing I don't ask. Going to run it with temp 0.6 and top_p 0.95 for a while and compare.

English

108

Mia@MiaAI_lab·3h

@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.

English

cryptowish.eth@anuntrapid_auto·4h

@MiaAI_lab @0xSero 4 x is the way. TP 8 vllm/sglang

English

0xSero@0xSero·18h

MiniMax-M3-NVFP4 running on 4x RTX PRO 6000 Repo coming soon.

English

150

7.1K

Mia@MiaAI_lab·4h

@theo @WIRED They live in the past.

English

Theo - t3.gg@theo·6h

@WIRED Why are you posting 4 day old articles that are no longer accurate lol

English

1.7K

25.4K

WIRED@WIRED·14h

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…

English

278

77.7K

Mia@MiaAI_lab·5h

@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?

English

mr-r0b0t@mr_r0b0t·11h

@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl! Bet you're already thinking about 4 and a switch tho 😛🤩 Imma try to surprise those urges for a bit

English

121

Mia@MiaAI_lab·5h

@joaosump @0xSero Yes, TP works in pairs, 2,4, etc. 3 units would be good more concurrencies and kv-cache.

English

Vieirowski@joaosump·5h

@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(

English

Mia@MiaAI_lab·5h

@joaosump @0xSero No, you go up to 3x DGX Sparks connected directly without the need for a switch. There's even a workaround to connect 4x DGX Sparks without a switch.

English

Vieirowski@joaosump·6h

@MiaAI_lab @0xSero Btw, wouldn't 3x sparks require an expensive switch?

English

Mia@MiaAI_lab·5h

@0xSero Any solution to connect to Codex app from my iPhone when using codex-shim?

English

Mia@MiaAI_lab·6h

@advented_ Yes, but it works remarkably well. I tried with 4 sessions, and max tok/s was around 80, which is insane considering the limited bandwidth of DGX Spark.

English

Advented@advented_·7h

@MiaAI_lab They way I understand is Sequences mean concurrent requests. So your recipe can serve 6 requests at 1M context concurrently. Under full/heavy load t/s can spike and drop bc the longest 1M session has to finish generating for other concurrent requests to continue

English

Mia@MiaAI_lab·2d

53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳 Elapsed time: 56.694 s Prompt tokens: 29 Generated tokens: 3000 Total tokens: 3029 Generation tok/sec: 52.92 End-to-end tok/sec: 53.43

English

4.6K

Mia@MiaAI_lab·7h

@AlicanKiraz0 Is this with MTP?

English

Alican Kiraz@AlicanKiraz0·9h

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 and 4x DGX Spark 430 GB Vram 🔥 25 tok/sec 🚀

Polski

3.2K

Mia@MiaAI_lab·8h

@TheAhmadOsman That's a lot of GPUs. Looks messy as hell.

English

Ahmad@TheAhmadOsman·8 Eyl

My house has 33 GPUs. > 21x RTX 3090s > 4x RTX 4090s > 4x RTX 5090s > 4x Tenstorrent Blackhole p150a Before AGI arrives: Acquire GPUs. Go into debt if you must. But whatever you do, secure the GPUs.

English

634

239

4.8K

2.1M

Mia@MiaAI_lab·8h

@usr_bin_roygbiv @tmaiaroto Not to mention the amount of power and heat that would come out of such a config.

English

Mia@MiaAI_lab·8h

@usr_bin_roygbiv @tmaiaroto Do tell how many 5090 or 3090 I would need to run DeepSeek-v4-Flash with 1M context. Meanwhile, you can do that with 2x DGX Sparks with 45 tok/s. x.com/MiaAI_lab/stat…

Mia@MiaAI_lab

Run DeepSeek v4 Flash locally on your 2x DGX Sparks easily, with 1M context github.com/MiaAI-Lab/Deep…

English

Tom Maiaroto@tmaiaroto·16h

Do NOT buy a DGX Spark if prefill speeds and running local AI is not important to you.

Roy@usr_bin_roygbiv

show me your actual tokens per second on a video or a screenshot not this fucking slop DO NOT buy a dgx or mac for llm hosting

English

1.5K

Mia@MiaAI_lab·16h

@BlockedPaths I love it and with 1M context, locally, it's a no brainer IMO.

English

BlockedPath@BlockedPaths·16h

@MiaAI_lab Deepseek is so slept on.

English

Mia@MiaAI_lab·16h

@DevRico003 @TheGoldenAnvil I'm doing coding with it almost all day long. It's awesome.

English

DevRico003@DevRico003·16h

@MiaAI_lab @TheGoldenAnvil can you do some real coding work with DS4F on 2x sparks? I'm considering to buy a second spark but I'm not sure yet.

English

TheGoldenAnchor@TheGoldenAnvil·17h

Awaiting update, seed is growing the the interim.

Mia@MiaAI_lab

Running agentic coding benchmarks on DeepSeek-v4-Flash and Step-3.7-Flash. Will post results soon.

English

Mia@MiaAI_lab·17h

@broadfield_dev It's awesome, it's my go-to right now. And 1M context it can do long /goals with Codex app.

English

broadfield-dev@broadfield_dev·17h

@MiaAI_lab deepseek v4 flash is great, I use it a lot in my DIY agent harness and there is nothing it can't handle given enough time. It's also not painful to give it all the time it needs to complete tasks.

English

Descubrir

@mr_r0b0t @garychanhk825 @Tech2Wild @NVIDIAAI @RobbiewOnline @QuixiAI @deepseek_ai @StepFun_ai