Dylan Lim

11 posts

Dylan Lim banner
Dylan Lim

Dylan Lim

@dylan__lim

cs @stanford

Stanford, CA انضم Nisan 2024
411 يتبع261 المتابعون
تغريدة مثبتة
Dylan Lim أُعيد تغريده
Stuart Sul
Stuart Sul@stuart_sul·
(1/7) We're releasing ThunderKittens 2.0! Faster kernels, cleaner code, industry contributions, and new state-of-the-art BF16 / MXFP8 / NVFP4 GEMMs that match or surpass cuBLAS! Alongside this release, we’re equally excited to share some insights we learned while squeezing every last TFLOP out of Blackwell: (with @hazyresearch & generously supported by @cursor_ai)
Stuart Sul tweet media
English
13
88
544
59.1K
Dylan Lim أُعيد تغريده
Flapping Airplanes
Flapping Airplanes@flappyairplanes·
Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.
GIF
English
339
256
3.6K
2.1M
Dylan Lim أُعيد تغريده
Stuart Sul
Stuart Sul@stuart_sul·
(1/6) GPU networking is the remaining AI efficiency bottleneck, and the underlying hardware is changing fast! We’re happy to release ParallelKittens, an update to ThunderKittens that lets you easily write fast computation-communication overlapped multi-GPU kernels, along with new kernels for data, tensor, sequence, and expert parallelism! Here’s a photo of overlapped kittens, along with things you should care about when optimizing multi-GPU kernels. (With @simran_s_arora, @bfspector, and @hazyresearch. Generously supported by @cursor_ai and @togethercompute)
Stuart Sul tweet media
English
9
60
517
155.4K
Dylan Lim أُعيد تغريده
Andrej Karpathy
Andrej Karpathy@karpathy·
So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The *optimal* orchestration of compute and memory is only achievable in this way.
Benjamin F Spector@bfspector

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint with @jordanjuravsky, @stuart_sul, @OwenDugan, @dylan__lim, @realDanFu, @simran_s_arora, and @HazyResearch)

English
62
244
2K
267.2K
Dylan Lim
Dylan Lim@dylan__lim·
Had a super fun time building this out - always love working on distributed ML systems. Big thanks to @pearvc for awarding us the best startup prize at Stanford TreeHacks!
Aksh Garg@AkshGarg03

(1/5) @CKT_Conner, @dill_pkl, @emilyzsh, and I are excited to introduce Shard - a proof-of-concept for an infinitely scalable distributed system composed of consumer hardware for training and running ML models! Features: - Data + Pipeline Parallel for handling arbitrarily large models - Algorithmic load balancing for throughput optimization - Fault tolerance for unreliable machines

English
1
1
11
2.3K
Dylan Lim
Dylan Lim@dylan__lim·
@AkshGarg03 AI Financial Advisory Service: 1) Advisor Devin personalizes investment strategies. 2) Risk Manager Devin assesses and mitigates financial risks. 3) Market Analyst Devin forecasts market trends using AI.
English
0
0
1
1.1K
Aksh Garg
Aksh Garg@AkshGarg03·
we're collecting project ideas for D3N to try out!! have interesting devin projects you want to try out? reply below and we'll select the most upvoted/interesting ideas to send to devin Bonus points if the ideas are naturally distributed or parallelizable
English
7
1
11
9.3K