Paul

112 posts

Paul

@edchangy

ML and stuff

Katılım Haziran 2011

725 Takip Edilen134 Takipçiler

Paul retweetledi

Dan Alistarh@DAlistarh·25 Mar

Speedrunning GPT-2 is now routine thanks to @karpathy. But can we speedrun GPT3-175B? We attempted to match accuracy on a <$10K budget; while we didn't quite reach it, our first results show that quality data, engineering, and native FP4 can get close. Details in 🧵

English

170

12.4K

Paul retweetledi

Matej Sirovatka@m_sirovatka·13 Mar

What’s the best model you can train in a day if someone hands you a pile of Blackwell GPUs? You can try out yourself On April 9 in Paris, @GPU_MODE + @verdacloud + @sestercegroup are hosting a GPU hackathon with a bunch of GPUs to run on and even more of them for the winners.

English

160

8.9K

Paul@edchangy·12 Eki

@iamgrigorev github.com/IST-DASLab/llmq does some of that

English

George Grigorev@iamgrigorev·11 Eki

I am thinking of writing the next blogpost about these topics: Optimizing training throughout with FP8 I will show how to write FP8 kernels How to implement DDP How to implement FSDP, with distributed Muon How to implement TP Gradient accumulation Gradient checkpointing I think using “How to scale your model” on a real consumer gpus connected with pcie and writing that from scratch on pure PyTorch would be really useful

English

357

20.6K

Paul retweetledi

Dan Alistarh@DAlistarh·6 Eki

🚀 We are releasing state-of-the-art post-training quantization (PTQ) algorithms for Microscaling FP4, together with kernels: - First study focused on MXFP4/NVFP4 PTQ for LLMs - New Micro-Rotated (MR) format and GPTQ algorithm - QuTLASS GPU kernels with up to 3.6x speedups.

English

153

9.4K

Paul@edchangy·18 Eyl

@antferdom @JackMonas @DataCrunch_io @scannell_aidan @rmwu36 @yizhao0313 @gpupoorftw @JackMonas, I'm just checking the submission deadline, the ICCV workshop page says 20.09, but is it now 27.09?

English

101

Antonio J. Dominguez@antferdom·18 Eyl

@JackMonas @DataCrunch_io Truly important challenge to understand in-depth video world models problems! Making explicit mentions to the main team contributors as well, @Scannell_aidan in compression track, and @Rmwu36 and @Yizhao0313 sampling. Lead by @edchangy and @gpupoorftw

English

150

Jack Monas@JackMonas·18 Eyl

New leader for the Compression track in the ICCV 1X World Model Challenge! Submission from @DataCrunch_io @antferdom Final deadline to submit solutions is Sep. 27 AoE

Daniel Ho@itsdanielho

We at @1x_tech with @JackMonas are excited to announce the ICCV phase of our 1X World Model Challenge: huggingface.co/spaces/1x-tech… Participate in the Compression and Sampling tracks for a $8k prize pool & train generative models for cool robot results like: 1x.tech/discover/redwo…

English

1.7K

Paul retweetledi

WaveSpeedAI@wavespeed_ai·30 Tem

👇Retweet this tweet, private message me your WaveSpeedAI account, and I'll give you guys $3 credits!🤩Only the first 100 people can get it

WaveSpeedAI@wavespeed_ai

🚨 WAN 2.2 PRICE DROP — WORLDWIDE EXCLUSIVE 🔥 The best model just got MASSIVELY cheaper on WaveSpeedAI — now with full API support! 🎬 480p: $0.25 → $0.15 (⚡ 40% off!) 🎥 720p: $0.40 → $0.30 🎯 Fastest. Cheapest. Most powerful. Try it now: wavespeed.ai/collections/wa… #AI #TextToVideo #GenAI #WaveSpeedAI #Wan2_2 #AIvideo

English

998

Paul retweetledi

Verda (formerly DataCrunch)@verdacloud·25 Haz

❗️ We just expanded our capacity of B200 SXM6 180GB servers – available in our Cloud Platform. The best thing is… With DataCrunch, you can deploy the Blackwell platform without approvals. Just sign in and select the instance type: cloud.datacrunch.io/?utm_source=x&…

English

149

Paul@edchangy·30 May

Also pretty cool to see open source community building on top of each other!

English

Paul@edchangy·30 May

Link to blog: datacrunch.io/blog/multi-hea…

English

Paul@edchangy·30 May

A new paper just dropped from Tri Dao(🐐)'s lab! arxiv.org/abs/2505.21487 Here is my hot take!

English

Paul@edchangy·30 May

The paper also suggests Group Tied Attention (GTA), which works in the opposite direction and draws inspiration from MLA, incorporating those techniques into GQA.

English

Paul@edchangy·30 May

Well, the paper suggests a hybrid method. What about using MLA and adding groups?

English

Paul@edchangy·30 May

First of all, a confession! In the blog titled 'Multi-Head Latent Attention: Benefits in Memory and Computation', we didn't tell the whole story—the benchmarking on a single GPU. In reality, for DeepSeek V3-style models, parallelization is needed.

English

Paul@edchangy·30 May

Instead, one must make a copy of the latent component across GPUs, which feels wasteful.

English

Paul retweetledi

Verda (formerly DataCrunch)@verdacloud·29 May

🆕 Inference API for FLUX.1 Kontext [max] & [pro] are now available on DataCrunch! We are an infrastructure partner of @bfl_ml for Kontext, a suite of generative flow matching models for text-to-image and image-to-image editing. Learn more: datacrunch.io/managed-endpoi…

English

291

Paul retweetledi

Verda (formerly DataCrunch)@verdacloud·26 May

🚨 Summer Inference by Symposium AI is happening next Wednesday, June 4, at 16:00-22:00. 🇫🇮 This event will bring together 250 AI engineers, researchers, and founders under one roof in Helsinki. 🔗 You can still grab one of the last remaining seats: lu.ma/x5hhj79x

English

128

Keşfet

@karpathy @GPU_MODE @verdacloud @sestercegroup @iamgrigorev @antferdom @JackMonas @scannell_aidan