A

17 posts

A banner
A

A

@kushari

Katılım Mart 2009
256 Takip Edilen283 Takipçiler
A
A@kushari·
@GNCLiveWell It was already taken care of. Thanks!
English
0
0
0
3
GNC
GNC@GNCLiveWell·
@kushari Thank you for raising your concern so that we can help you please tell me the order number.
English
2
0
0
11
A
A@kushari·
@GNCLiveWell your PayPal checkout process needs an overhaul. I’ve never seen a login with PayPal that just checks you out without showing you final pricing before asking you to confirm. So now I have an order I didn’t authorize and I don’t want it.
English
1
0
0
21
A
A@kushari·
@facebook Seems like marketplace is having issues. When I click create item, the page loads, then it goes blank. Tried different browsers etc.
English
12
0
2
94
Apoorv
Apoorv@apoorvdarshan·
@mweinbach what's the electricity bill
English
1
0
0
1.6K
Max Weinbach
Max Weinbach@mweinbach·
Local Kimi K2.5 the model FINALLY downloaded
Max Weinbach tweet mediaMax Weinbach tweet media
English
28
12
484
109.2K
Max Weinbach
Max Weinbach@mweinbach·
@ryanvogel 683GB but the problem is it saturated my entire network bandwidth so every other device was practically unusable so this becomes a multi day attempt at finding when i can bottleneck everything else
English
3
0
23
3.8K
A
A@kushari·
Here’s another scam, this time it’s targeting @Google accounts. Call and waste their scammers time. Interesting that this one came through as an iMessage.
A tweet media
English
0
0
0
96
A
A@kushari·
New @coinbase scam number. Everyone call and waste their time plz. Will tweet every time they change numbers to discourage them from this scam.
A tweet media
English
3
1
3
246
A retweetledi
Matt Beton
Matt Beton@MattBeton·
Apple’s new M3 Ultra Mac Studio is a TRAINING POWERHOUSE 512GB unified memory, 800GB/s bandwidth, 43TFLOPS at fp16. With four of them ($38,000!), and you can fine-tune DeepSeek’s 671B MoE model overnight. 14k training examples, or 21.6 tok/s inference. Let’s see why: DeepSeek R1 is a 671B parameter model, trained at 8-bit - so the full model is 671GB, able to fit on just two Mac Studios. Since DeepSeek is a Mixture-of-Experts model with only 8/256 active experts, in each forwards-pass of the model we only need to activate 37B parameters. Firstly, in inference we are bound by memory bandwidth of the device. To do a single forwards-pass of the model we need to load 37GB of memory across two devices. Connecting the devices by Thunderbolt 5 (120Gbps) we get a theoretical token throughput of 800/37 = 21.6 tok/s. Now for fine-tuning - since we need to store gradients as well as model parameters, we double the memory requirement - needing 4 Mac Studios this time. With a sequence length of 512, we need to calculate roughly 37B * 6 * 512 = 113 TFLOPs per training example. As a lower bound (if we consider only one machine acting at a time, and the others wait idle), that gives us 2.6s/train example. With some clever pipeline scheduling, training throughput could potentially be quadrupled. So what does this mean? You could leave your $38,000 of 512GB M3 Ultra Mac Studios running overnight, and they could fine-tune over 14k examples. More than enough for a small-to-medium-sized fine-tuning run. Benchmarks coming soonTM
Matt Beton tweet media
English
29
140
1.2K
107.8K