Mia

308 posts

Mia

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

Присоединился Temmuz 2022

190 Подписки192 Подписчики

Mia@MiaAI_lab·4m

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai Insane. How much tok/s when in deep context?

English

Christian Merrill@M_Chimiste·7m

@MiaAI_lab @QuixiAI @deepseek_ai @StepFun_ai For K2.6, yes with EXO and a thunderbolt cable with RDMA.

English

Mia@MiaAI_lab·19h

DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark. Full results in: github.com/MiaAI-Lab/Deep…

English

2.6K

Mia@MiaAI_lab·17m

Exactly my point. And they are going to IPO soon.

Mia@MiaAI_lab

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

English

Mia@MiaAI_lab·24m

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai Are you running both Mac Studios as a cluster for 1tb unified ram?

English

Christian Merrill@M_Chimiste·32m

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai If I had a better way to run K2.6 I probably would. Though even though it’s slower than an Nvidia farm, it’s the best I’ve got so I kinda need it still 😅

English

Mia@MiaAI_lab·33m

@RaulWesche Good to see. It's a beast, and my go-to currently. Enjoy

English

Raul Wesche@RaulWesche·50m

@MiaAI_lab I’ve been using your ds4-flash you posted a few days ago for dual sparks and it’s awesome

English

Mia@MiaAI_lab·5h

I find default DS4-Flash temperature is giving it "too" much creativity for coding. It sometimes does thing I don't ask. Going to run it with temp 0.6 and top_p 0.95 for a while and compare.

English

176

Mia@MiaAI_lab·50m

@yacineMTB x.com/miaai_lab/stat…

Mia@MiaAI_lab

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

QME

kache@yacineMTB·11h

I wonder how much money anthropic is losing every day they don't have fable available. Probably a lot. I'm beginning to actually feel sorry for them..

English

112

642

35.5K

Mia@MiaAI_lab·51m

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

English

Mia@MiaAI_lab·52m

@Tech2Wild @mr_r0b0t @NVIDIAAI It's more addicting than games.

English

Tech2Wild@Tech2Wild·13h

@mr_r0b0t @NVIDIAAI 🤣🤣🤣 that shit is fucking addicting

English

160

mr-r0b0t@mr_r0b0t·14h

So I did a thing 😁

English

108

5.2K

Mia@MiaAI_lab·1h

@mr_r0b0t @garychanhk825 @Tech2Wild @NVIDIAAI You do know that you will get the 4th, right? Right??

English

mr-r0b0t@mr_r0b0t·1h

2x will cover many many use cases! The third one should help me train larger models, something that could easily be done more quickly by renting a B200 cloud instance. Given I still have much to learn, renting GPUs could/would become expensive very quickly, and any learning (read mistakes) would be quite costly!

English

Mia@MiaAI_lab·3h

@RobbiewOnline I agree, I use it everyday. I published it yesterday. github.com/MiaAI-Lab/repo…

English

RobbiewOnline@RobbiewOnline·4h

@MiaAI_lab RepoPrompt looks like a smart way to optimize token usage when working with LLMs, something we dive deep into in the book. The cost of blindly pasting entire files adds up fast (often £1-4 per session), so selective file inclusion is key. Your XML output approach could pair well with local model routing strategies covered in Chapter 2. amazon.com/dp/B0GYDV3FXD If this was helpful, I’d appreciate a repost and a follow. I’ll be sharing more insights from the book, plus what I’ve learned from applying them in the real world, over the coming weeks.

English

Mia@MiaAI_lab·4h

@QuixiAI @deepseek_ai @StepFun_ai If you need vision then you need vision. But if it's not a must I think DS4-Flash is a better choice.

English

Eric Hartford@QuixiAI·16h

@MiaAI_lab @deepseek_ai Deepseek v4 Flash is text-only, 284B @StepFun_ai Step 3.7 Flash is a Text + Vision model, 198B The vision and the smaller size are more appealing. I choose Step 3.7 Flash.

English

1.1K

Mia@MiaAI_lab·4h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai I wish I had the compute to run MiniMax M3. For now, DeepSeek-v4-Flash is unbeatable for 2x DGX Spark setup.

English

Christian Merrill@M_Chimiste·4h

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai I had a lot of tool call issues with Step 3.7. I think I was using Q8 at the time in Hermes Agent. I ended up reverting to Minimax M2.7 and working on moving to M3 for the multimodal input.

English

Mia@MiaAI_lab·5h

@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.

English

cryptowish.eth@anuntrapid_auto·5h

@MiaAI_lab @0xSero 4 x is the way. TP 8 vllm/sglang

English

0xSero@0xSero·19h

MiniMax-M3-NVFP4 running on 4x RTX PRO 6000 Repo coming soon.

English

154

7.3K

Mia@MiaAI_lab·6h

@theo @WIRED They live in the past.

English

Theo - t3.gg@theo·7h

@WIRED Why are you posting 4 day old articles that are no longer accurate lol

English

1.8K

27.2K

WIRED@WIRED·16h

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…

English

286

81.3K

Mia@MiaAI_lab·6h

@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?

English

mr-r0b0t@mr_r0b0t·12h

@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl! Bet you're already thinking about 4 and a switch tho 😛🤩 Imma try to surprise those urges for a bit

English

137

Mia@MiaAI_lab·7h

@joaosump @0xSero Yes, TP works in pairs, 2,4, etc. 3 units would be good more concurrencies and kv-cache.

English

Vieirowski@joaosump·7h

@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(

English

Mia@MiaAI_lab·7h

@joaosump @0xSero No, you go up to 3x DGX Sparks connected directly without the need for a switch. There's even a workaround to connect 4x DGX Sparks without a switch.

English

Vieirowski@joaosump·7h

@MiaAI_lab @0xSero Btw, wouldn't 3x sparks require an expensive switch?

English

Mia@MiaAI_lab·7h

@0xSero Any solution to connect to Codex app from my iPhone when using codex-shim?

English

Mia@MiaAI_lab·7h

@advented_ Yes, but it works remarkably well. I tried with 4 sessions, and max tok/s was around 80, which is insane considering the limited bandwidth of DGX Spark.

English

Advented@advented_·8h

@MiaAI_lab They way I understand is Sequences mean concurrent requests. So your recipe can serve 6 requests at 1M context concurrently. Under full/heavy load t/s can spike and drop bc the longest 1M session has to finish generating for other concurrent requests to continue

English

Mia@MiaAI_lab·2d

53 tok/s achieved on Step-3.7-Flash NVFP4 with MTP on 2x DGX Spark with 256k context. 🎉🥳 Elapsed time: 56.694 s Prompt tokens: 29 Generated tokens: 3000 Total tokens: 3029 Generation tok/sec: 52.92 End-to-end tok/sec: 53.43

English

4.6K

Открыть

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai @RaulWesche @yacineMTB @AnthropicAI @Tech2Wild