Mia

310 posts

Mia

@MiaAI_lab

Local AI, LLMs, tech thinker & builder

انضم Temmuz 2022

191 يتبع193 المتابعون

Mia@MiaAI_lab·52m

Bro living the dream. 1TB of unified ram running Kimi k2.6.

Christian Merrill@M_Chimiste

@MiaAI_lab @QuixiAI @deepseek_ai @StepFun_ai This is how it’s currently configured with the weights being stored on dedicated M.2 drives on the side. I probably should change the configuration since I believe it’s slower with them stacked like this but it’s more convenient space wise.

English

Mia@MiaAI_lab·1h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai Insane. How much tok/s when in deep context?

English

Christian Merrill@M_Chimiste·1h

@MiaAI_lab @QuixiAI @deepseek_ai @StepFun_ai For K2.6, yes with EXO and a thunderbolt cable with RDMA.

English

Mia@MiaAI_lab·20h

DeepSeek-v4-Flash beats Step-3.7-Flash in head-to-head tool calling benchmark. Full results in: github.com/MiaAI-Lab/Deep…

English

2.6K

Mia@MiaAI_lab·1h

Exactly my point. And they are going to IPO soon.

Mia@MiaAI_lab

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

English

Mia@MiaAI_lab·1h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai Are you running both Mac Studios as a cluster for 1tb unified ram?

English

Christian Merrill@M_Chimiste·1h

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai If I had a better way to run K2.6 I probably would. Though even though it’s slower than an Nvidia farm, it’s the best I’ve got so I kinda need it still 😅

English

Mia@MiaAI_lab·1h

@RaulWesche Good to see. It's a beast, and my go-to currently. Enjoy

English

Raul Wesche@RaulWesche·1h

@MiaAI_lab I’ve been using your ds4-flash you posted a few days ago for dual sparks and it’s awesome

English

Mia@MiaAI_lab·6h

I find default DS4-Flash temperature is giving it "too" much creativity for coding. It sometimes does thing I don't ask. Going to run it with temp 0.6 and top_p 0.95 for a while and compare.

English

212

Mia@MiaAI_lab·1h

@yacineMTB x.com/miaai_lab/stat…

Mia@MiaAI_lab

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

QME

kache@yacineMTB·12h

I wonder how much money anthropic is losing every day they don't have fable available. Probably a lot. I'm beginning to actually feel sorry for them..

English

125

690

39.8K

Mia@MiaAI_lab·1h

The publicity @AnthropicAI got from Fable 5 drama is going to create even more demand for it. There is no such thing of bad publicity if your product is good.

English

115

Mia@MiaAI_lab·1h

@Tech2Wild @mr_r0b0t @NVIDIAAI It's more addicting than games.

English

Tech2Wild@Tech2Wild·14h

@mr_r0b0t @NVIDIAAI 🤣🤣🤣 that shit is fucking addicting

English

167

mr-r0b0t@mr_r0b0t·15h

So I did a thing 😁

English

109

5.4K

Mia@MiaAI_lab·2h

@mr_r0b0t @garychanhk825 @Tech2Wild @NVIDIAAI You do know that you will get the 4th, right? Right??

English

mr-r0b0t@mr_r0b0t·2h

2x will cover many many use cases! The third one should help me train larger models, something that could easily be done more quickly by renting a B200 cloud instance. Given I still have much to learn, renting GPUs could/would become expensive very quickly, and any learning (read mistakes) would be quite costly!

English

Mia@MiaAI_lab·4h

@RobbiewOnline I agree, I use it everyday. I published it yesterday. github.com/MiaAI-Lab/repo…

English

RobbiewOnline@RobbiewOnline·4h

@MiaAI_lab RepoPrompt looks like a smart way to optimize token usage when working with LLMs, something we dive deep into in the book. The cost of blindly pasting entire files adds up fast (often £1-4 per session), so selective file inclusion is key. Your XML output approach could pair well with local model routing strategies covered in Chapter 2. amazon.com/dp/B0GYDV3FXD If this was helpful, I’d appreciate a repost and a follow. I’ll be sharing more insights from the book, plus what I’ve learned from applying them in the real world, over the coming weeks.

English

Mia@MiaAI_lab·4h

@QuixiAI @deepseek_ai @StepFun_ai If you need vision then you need vision. But if it's not a must I think DS4-Flash is a better choice.

English

Eric Hartford@QuixiAI·17h

@MiaAI_lab @deepseek_ai Deepseek v4 Flash is text-only, 284B @StepFun_ai Step 3.7 Flash is a Text + Vision model, 198B The vision and the smaller size are more appealing. I choose Step 3.7 Flash.

English

1.1K

Mia@MiaAI_lab·4h

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai I wish I had the compute to run MiniMax M3. For now, DeepSeek-v4-Flash is unbeatable for 2x DGX Spark setup.

English

Christian Merrill@M_Chimiste·5h

@QuixiAI @MiaAI_lab @deepseek_ai @StepFun_ai I had a lot of tool call issues with Step 3.7. I think I was using Q8 at the time in Hermes Agent. I ended up reverting to Minimax M2.7 and working on moving to M3 for the multimodal input.

English

Mia@MiaAI_lab·6h

@anuntrapid_auto @0xSero Yeah 4 is the way. No going to commit to 3 sparks if I'm not planning to get the 4th one.

English

cryptowish.eth@anuntrapid_auto·6h

@MiaAI_lab @0xSero 4 x is the way. TP 8 vllm/sglang

English

0xSero@0xSero·20h

MiniMax-M3-NVFP4 running on 4x RTX PRO 6000 Repo coming soon.

English

156

7.5K

Mia@MiaAI_lab·7h

@theo @WIRED They live in the past.

English

Theo - t3.gg@theo·8h

@WIRED Why are you posting 4 day old articles that are no longer accurate lol

English

1.9K

28.4K

WIRED@WIRED·17h

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks. wired.com/story/anthropi…

English

290

83.2K

Mia@MiaAI_lab·7h

@mr_r0b0t @Tech2Wild @NVIDIAAI 3 can't be used for TP though... you want it for more concurrent sessions and more kv cache?

English

mr-r0b0t@mr_r0b0t·13h

@Tech2Wild @NVIDIAAI 3 is a big unlock tho ngl! Bet you're already thinking about 4 and a switch tho 😛🤩 Imma try to surprise those urges for a bit

English

142

Mia@MiaAI_lab·7h

@joaosump @0xSero Yes, TP works in pairs, 2,4, etc. 3 units would be good more concurrencies and kv-cache.

English

Vieirowski@joaosump·8h

@MiaAI_lab @0xSero Damn, that's nice to know, have been considering 2x and thought that was the ceiling. 3x would be an odd number though for vllm right? :(

English

Mia@MiaAI_lab·8h

@joaosump @0xSero No, you go up to 3x DGX Sparks connected directly without the need for a switch. There's even a workaround to connect 4x DGX Sparks without a switch.

English

Vieirowski@joaosump·8h

@MiaAI_lab @0xSero Btw, wouldn't 3x sparks require an expensive switch?

English

اكتشف

@M_Chimiste @QuixiAI @deepseek_ai @StepFun_ai @RaulWesche @yacineMTB @AnthropicAI @Tech2Wild