Will Bui

85 posts

Will Bui

Will Bui

@will_ea

vLLM contributor. https://t.co/fibbSz0kUc doing things at

the edge of stability 参加日 Mayıs 2022
402 フォロー中189 フォロワー
固定されたツイート
Flapping Airplanes
Flapping Airplanes@flappyairplanes·
(4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff.
Flapping Airplanes tweet media
English
27
55
664
230.8K
Flapping Airplanes
Flapping Airplanes@flappyairplanes·
(1/5) Great to be at @sequoia to give a sneak peek of one of our research directions! TL;DR one path to data-efficiency may be to “abuse GPUs like they’ve never been abused before”
English
13
66
950
143.3K
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
I have now officially became one of the vLLM contributors, after long months of hard work. It has been a long and hard journey, I would like to thank my family, friends and my company for supporting me along the way. nah jk my 3 line PR just got merged 🫡
English
16
2
315
10.8K
himanshu
himanshu@retr0sushi_·
how does one develop a taste for research? explain to me like i am a guy entering college with unlimited energy and enthusiasm
English
37
1
175
17.7K
Anne Ouyang
Anne Ouyang@anneouyang·
TIL the Huawei Ascend linear algebra kernels library is called "CATLASS" 🐱
Anne Ouyang tweet media
English
8
11
149
12.4K
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer yeah it's orchestration level more than the raw ops; but it is important because that's double the layer routing overhead in practice and where the chunk of the slowdown is reported from if i remember right in the original paper (and why they do blocks vs all layers)
English
1
0
1
41
Will Bui
Will Bui@will_ea·
@slLuxia @LLMenjoyer The package exposes lower-level phase 1/phase 2 ops that users can use to compose the routing however they want. I believe what you’re referring to is the experimental API/example, which is still rough around the edges and mainly meant for research/prototyping.
English
1
0
2
72
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer hmm; it seems like you drop some routing; in the original paper they do 2x per block for pre-MLP & pre-attention but your repo does each block as a single layer. can see in fig 2 and section 2 of the moonshot paper. is this intended?
English
1
0
1
72
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer should be agnostic to block size/allow arbitrary blocks yeah? i find maintaining predicted interlayer circuitry gives the best perf for attnres
English
2
0
1
956
Will Bui
Will Bui@will_ea·
@ricklamers indeed. im personally very excited to contribute to the ai infra space
English
0
0
6
61
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
337
2.1K
13.6K
5M
Will Bui
Will Bui@will_ea·
@LLMenjoyer this is actually pretty fire now that i think more about it...
English
0
0
4
80
Adam Mainz
Adam Mainz@MainzOnX·
@A_K_Nain Yeah no one tells you that it’s actually not the fun part 😂 I’m writing backwards ops for 25% of my time last week and probably next week it’s no fun
English
3
0
16
871
Aakash Kumar Nain
Aakash Kumar Nain@A_K_Nain·
I hate writing kernels. We shouldn't be doing this at least frequently. I hope the next gen compilers are much better and smart.
English
5
0
42
3.6K
Will Bui
Will Bui@will_ea·
@blelbach I spent a couple weeks just writing kernels
English
0
0
0
66
Bryce, the CUDA Colonel
Bryce, the CUDA Colonel@blelbach·
I have not worked hours this long since the start of my career. So much to do these days. So many possibilities that are now unlocked.
English
4
0
47
4.1K
Will Bui
Will Bui@will_ea·
@henrylhtsang I mean you need a decent amount of prerequisites knowledge to start with any substantial kernel work. The actual kernels may not seem like much but to get to that point requires narrowing down a bunch of ways to do it optimally that having experience would help.
English
0
0
0
195
henry tsang
henry tsang@henrylhtsang·
one impression of working a new grad: they can seem... slow? like slow to finish a task and turn around in general is that what my TL thought of me when I first started
English
4
0
18
2.5K
rita kozlov 🐀
rita kozlov 🐀@ritakozlov·
i've picked up the pen so many times to write about being a woman in tech and every time i chicken out because there's this catch-22: to talk about being a woman in tech, you need to have credibility. and once you start talking about it as a woman, you lose said credibility so i'm going to mortgage some of my credibility to get this off my chest, as someone who has both had a pretty successful career in tech, and leads a team with a lot of women on it: every woman you work with has had the most insane shit happen to her — on an almost daily basis. shit that makes you look at the camera and go "how did i end up here". from wild remarks about appearance to stalking and trauma dumping, and just constant dismissal from so many directions (employees, customers...). shit that you never tell anyone because they wouldn't believe you... i recently learned that like 97% of my followers on here are men. so my challenge to you is just to sit with that for a moment. you don't need to do anything about it (other than try not to be that person). but you should be aware that that's what every woman you work with deals with
English
61
123
1.2K
75.8K