17 posts

A

@kushari

Katılım Mart 2009

256 Takip Edilen283 Takipçiler

A@kushari·13 Mar

@GNCLiveWell It was already taken care of. Thanks!

English

GNC@GNCLiveWell·13 Mar

@kushari Thank you for raising your concern so that we can help you please tell me the order number.

English

A@kushari·6 Mar

@GNCLiveWell your PayPal checkout process needs an overhaul. I’ve never seen a login with PayPal that just checks you out without showing you final pricing before asking you to confirm. So now I have an order I didn’t authorize and I don’t want it.

English

A@kushari·16 Şub

@facebook Seems like marketplace is having issues. When I click create item, the page loads, then it goes blank. Tried different browsers etc.

English

A@kushari·1 Şub

@apoorvdarshan @mweinbach Probably not much. M silicon processors are very efficient.

English

108

Apoorv@apoorvdarshan·1 Şub

@mweinbach what's the electricity bill

English

1.6K

Max Weinbach@mweinbach·1 Şub

Local Kimi K2.5 the model FINALLY downloaded

English

484

109.2K

A@kushari·1 Şub

@mweinbach @ryanvogel What’s your network look like?

English

Max Weinbach@mweinbach·1 Şub

@ryanvogel 683GB but the problem is it saturated my entire network bandwidth so every other device was practically unusable so this becomes a multi day attempt at finding when i can bottleneck everything else

English

3.8K

A@kushari·9 Kas

Here’s another scam, this time it’s targeting @Google accounts. Call and waste their scammers time. Interesting that this one came through as an iMessage.

English

A@kushari·9 Kas

New @coinbase scam number. Everyone call and waste their time plz. Will tweet every time they change numbers to discourage them from this scam.

English

246

A retweetledi

Matt Beton@MattBeton·6 Mar

Apple’s new M3 Ultra Mac Studio is a TRAINING POWERHOUSE 512GB unified memory, 800GB/s bandwidth, 43TFLOPS at fp16. With four of them ($38,000!), and you can fine-tune DeepSeek’s 671B MoE model overnight. 14k training examples, or 21.6 tok/s inference. Let’s see why: DeepSeek R1 is a 671B parameter model, trained at 8-bit - so the full model is 671GB, able to fit on just two Mac Studios. Since DeepSeek is a Mixture-of-Experts model with only 8/256 active experts, in each forwards-pass of the model we only need to activate 37B parameters. Firstly, in inference we are bound by memory bandwidth of the device. To do a single forwards-pass of the model we need to load 37GB of memory across two devices. Connecting the devices by Thunderbolt 5 (120Gbps) we get a theoretical token throughput of 800/37 = 21.6 tok/s. Now for fine-tuning - since we need to store gradients as well as model parameters, we double the memory requirement - needing 4 Mac Studios this time. With a sequence length of 512, we need to calculate roughly 37B * 6 * 512 = 113 TFLOPs per training example. As a lower bound (if we consider only one machine acting at a time, and the others wait idle), that gives us 2.6s/train example. With some clever pipeline scheduling, training throughput could potentially be quadrupled. So what does this mean? You could leave your $38,000 of 512GB M3 Ultra Mac Studios running overnight, and they could fine-tune over 14k examples. More than enough for a small-to-medium-sized fine-tuning run. Benchmarks coming soonTM

English

140

1.2K

107.8K

Keşfet

@GNCLiveWell @facebook @apoorvdarshan @mweinbach @ryanvogel @Google @coinbase @elonmusk