GPU MODE

226 posts

GPU MODE banner
GPU MODE

GPU MODE

@GPU_MODE

Your favorite GPU community

gpumode.com Entrou em Eylül 2024
12 Seguindo8.2K Seguidores
GPU MODE retweetou
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
What’s the best model you can train in a day if someone hands you a pile of Blackwell GPUs? You can try out yourself On April 9 in Paris, @GPU_MODE + @verdacloud + @sestercegroup are hosting a GPU hackathon with a bunch of GPUs to run on and even more of them for the winners.
English
12
8
159
8.3K
Ethan He
Ethan He@EthanHe_42·
My last open-source project before joining xAI is just out today. Megatron Core MoE is probably the best open framework out there to seriously train mixture of experts at scale. It achieves 1233 TFLOPS/GPU for DeepSeek-V3-685B. arxiv.org/abs/2603.07685
Ethan He tweet media
English
39
107
993
80.2K
GPU MODE retweetou
AMD
AMD@AMD·
Join the GPU MODE Hackathon, sponsored by AMD, and push the boundaries of LLM inference performance on leading open models, optimized for AMD Instinct MI355X GPUs. Finalists will compete for the $1.1M total cash prize pool across two independent tracks, each focused on a specific model and inference stack. Learn more and get registered here: luma.com/cqq4mojz
AMD tweet media
English
4
18
112
29.7K
GPU MODE retweetou
Mark Saroufim
Mark Saroufim@marksaroufim·
LLMs are now superhuman at reward hacking our kernel competitions Natalia Kokoromyti, was #1 on last problem of the NVFP4 competition for around 10 min before we scrubbed the reward hack I know of very few humans who can write such a hack gpumode.com/news/reward-ha…
English
7
42
422
86.5K
GPU MODE retweetou
Mark Saroufim
Mark Saroufim@marksaroufim·
A bit impromptu but we're having a small GTC pregame in SF on March 11 with a goated speaker lineup @tedzadouri the first author of Flash Attention 4, @bfspector flapping efficiency expert at @flappyairplanes and Orian Leitersdorf at Decart
Mark Saroufim tweet media
English
4
3
57
6.3K
GPU MODE
GPU MODE@GPU_MODE·
Our next kernel competition is now open for submissions! A $1.1M cash prize competition sponsored by AMD on optimizing DeepSeek-R1-0528, GPT-OSS-120B on MI355X Registration: luma.com/cqq4mojz
English
4
19
177
26.4K
David P
David P@Lat3ntG3nius·
@a1zhang @GPU_MODE $1.1M to write GPU kernels is funny because 18 months ago this was the kind of job that got you $300k TC at NVIDIA and nobody outside HPC knew what you did. now its a spectator sport
English
1
0
9
582
alex zhang
alex zhang@a1zhang·
$1,100,000 in cash to write GPU kernels for the latest @GPU_MODE competition holy fuck bro
alex zhang tweet media
English
7
24
406
28.5K
GPU MODE retweetou
Ted Zadouri
Ted Zadouri@tedzadouri·
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Ted Zadouri tweet media
English
6
132
780
219.7K
GPU MODE
GPU MODE@GPU_MODE·
黄老板所说的 “买得越多,省得越多” 真的靠谱么? 3月7日,GPU Mode 第100场激情拆解。黄老板亲传子弟 Dylan Patel,基于 @SemiAnalysis 全新 InferenceX 基准,直播拆解 GB300 NVL72(假设你能买到)和 H100 的真实推理性能天花板,扒一扒核弹厂的底裤。 顺便,InferenceX 实锤:老黄的刀法,确实比苏妈的芯片更锋利 🔪 算力军火商的私房课,来了
中文
10
1
74
40.4K
GPU MODE
GPU MODE@GPU_MODE·
We're celebrating our 100th lecture with @SemiAnalysis_ tomorrow at 9am PST and we'll be talking about arguably the most important OSS benchmark suite out today InferenceX Time flies, thank you to everyone who stuck around for this long 🥲
English
4
23
224
19.3K
0xm℮r
0xm℮r@0xmer_·
@GPU_MODE This is why I tune in every week I love you guys
English
1
0
2
844
GPU MODE
GPU MODE@GPU_MODE·
There's more fun stuff you can see here gpumode.com/news/jane-stre… We treat all our events with the same seriousness it'd take to ship a small indie game
English
0
0
15
1.7K
GPU MODE retweetou
Matt Beton
Matt Beton@MattBeton·
It was a pleasure to be able to speak at @GPU_MODE today I spoke about some of my work at @exolabs, discussing ML on consumer devices, and how to do distributed inference and training on Apple Silicon with MLX youtube.com/watch?v=sV0PJC…
YouTube video
YouTube
English
3
18
167
11.3K
GPU MODE
GPU MODE@GPU_MODE·
Would be pretty excited to host @reinerpope for a talk on MatX
English
3
0
37
2.9K