Mia

311 posts

Mia

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

Joined Temmuz 2022

191 Following193 Followers

Mia@MiaAI_lab·3m

@mr_r0b0t @morganlinton @NVIDIAAI Same. I was considering buying 2 RTX 6000s but it's just not justifiable. I'm honestly shocked how satisfied I am with the DGX Sparks.

English

mr-r0b0t@mr_r0b0t·14m

@morganlinton @NVIDIAAI tbh my RTX6000 dreams were shattered with the recent price increase, combined with this sale 😂 crazy to think I'll have 3x GB10 (all 4TB) for less than a new 6000 after the most recent increase 😶‍🌫️😶‍🌫️😶‍🌫️

English

mr-r0b0t@mr_r0b0t·17h

So I did a thing 😁

English

114

5.8K

Mia@MiaAI_lab·2h

Bro living the dream. 1TB of unified ram running Kimi k2.6.

Christian Merrill@M_Chimiste

@MiaAI_lab @QuixiAI @deepseek_ai @StepFun_ai This is how it’s currently configured with the weights being stored on dedicated M.2 drives on the side. I probably should change the configuration since I believe it’s slower with them stacked like this but it’s more convenient space wise.

English

Mia@MiaAI_lab·2h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai Insane. How much tok/s when in deep context?

English

Christian Merrill@M_Chimiste·2h

@MiaAI_lab @QuixiAI @deepseek_ai @StepFun_ai For K2.6, yes with EXO and a thunderbolt cable with RDMA.

English

Mia@MiaAI_lab·22h

DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark. Full results in: github.com/MiaAI-Lab/Deep…

English

2.7K

Mia@MiaAI_lab·2h

Exactly my point. And they are going to IPO soon.

Mia@MiaAI_lab

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

English

Mia@MiaAI_lab·2h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai Are you running both Mac Studios as a cluster for 1tb unified ram?

English

Christian Merrill@M_Chimiste·2h

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai If I had a better way to run K2.6 I probably would. Though even though it’s slower than an Nvidia farm, it’s the best I’ve got so I kinda need it still 😅

English

Mia@MiaAI_lab·2h

@RaulWesche Good to see. It's a beast, and my go-to currently. Enjoy

English

Raul Wesche@RaulWesche·3h

@MiaAI_lab I’ve been using your ds4-flash you posted a few days ago for dual sparks and it’s awesome

English

Mia@MiaAI_lab·7h

I find default DS4-Flash temperature is giving it "too" much creativity for coding. It sometimes does thing I don't ask. Going to run it with temp 0.6 and top_p 0.95 for a while and compare.

English

289

Mia@MiaAI_lab·3h

@yacineMTB x.com/miaai_lab/stat…

Mia@MiaAI_lab

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

QME

kache@yacineMTB·13h

I wonder how much money anthropic is losing every day they don't have fable available. Probably a lot. I'm beginning to actually feel sorry for them..

English

126

727

43.4K

Mia@MiaAI_lab·3h

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

English

142

Mia@MiaAI_lab·3h

@Tech2Wild @mr_r0b0t @NVIDIAAI It's more addicting than games.

English

Tech2Wild@Tech2Wild·15h

@mr_r0b0t @NVIDIAAI 🤣🤣🤣 that shit is fucking addicting

English

181

Mia@MiaAI_lab·3h

@mr_r0b0t @garychanhk825 @Tech2Wild @NVIDIAAI You do know that you will get the 4th, right? Right??

English

mr-r0b0t@mr_r0b0t·3h

2x will cover many many use cases! The third one should help me train larger models, something that could easily be done more quickly by renting a B200 cloud instance. Given I still have much to learn, renting GPUs could/would become expensive very quickly, and any learning (read mistakes) would be quite costly!

English

Mia@MiaAI_lab·6h

@RobbiewOnline I agree, I use it everyday. I published it yesterday. github.com/MiaAI-Lab/repo…

English

RobbiewOnline@RobbiewOnline·6h

@MiaAI_lab RepoPrompt looks like a smart way to optimize token usage when working with LLMs, something we dive deep into in the book. The cost of blindly pasting entire files adds up fast (often £1-4 per session), so selective file inclusion is key. Your XML output approach could pair well with local model routing strategies covered in Chapter 2. amazon.com/dp/B0GYDV3FXD If this was helpful, I’d appreciate a repost and a follow. I’ll be sharing more insights from the book, plus what I’ve learned from applying them in the real world, over the coming weeks.

English

Mia@MiaAI_lab·6h

@QuixiAI @deepseek_ai @StepFun_ai If you need vision then you need vision. But if it's not a must I think DS4-Flash is a better choice.

English

Eric Hartford@QuixiAI·19h

@MiaAI_lab @deepseek_ai Deepseek v4 Flash is text-only, 284B @StepFun_ai Step 3.7 Flash is a Text + Vision model, 198B The vision and the smaller size are more appealing. I choose Step 3.7 Flash.

English

1.1K

Mia@MiaAI_lab·6h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai I wish I had the compute to run MiniMax M3. For now, DeepSeek-v4-Flash is unbeatable for 2x DGX Spark setup.

English

Christian Merrill@M_Chimiste·6h

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai I had a lot of tool call issues with Step 3.7. I think I was using Q8 at the time in Hermes Agent. I ended up reverting to Minimax M2.7 and working on moving to M3 for the multimodal input.

English

Mia@MiaAI_lab·7h

@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.

English

cryptowish.eth@anuntrapid_auto·7h

@MiaAI_lab @0xSero 4 x is the way. TP 8 vllm/sglang

English

0xSero@0xSero·21h

MiniMax-M3-NVFP4 running on 4x RTX PRO 6000 Repo coming soon.

English

157

7.7K

Mia@MiaAI_lab·8h

@theo @WIRED They live in the past.

English

Theo - t3.gg@theo·9h

@WIRED Why are you posting 4 day old articles that are no longer accurate lol

English

29.5K

WIRED@WIRED·18h

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…

English

294

85.1K

Mia@MiaAI_lab·8h

@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?

English

mr-r0b0t@mr_r0b0t·15h

@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl! Bet you're already thinking about 4 and a switch tho 😛🤩 Imma try to surprise those urges for a bit

English

154

Mia@MiaAI_lab·9h

@joaosump @0xSero Yes, TP works in pairs, 2,4, etc. 3 units would be good more concurrencies and kv-cache.

English

Vieirowski@joaosump·9h

@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(

English

Discover

@mr_r0b0t @morganlinton @NVIDIAAI @M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai @RaulWesche