Will

82 posts

Will

Will

@will_ea

doing things at the edge of stability. vLLM contributor. senior cs undergrad. https://t.co/fibbSz0kUc.

Katılım Mayıs 2022
391 Takip Edilen138 Takipçiler
Sabitlenmiş Tweet
Will
Will@will_ea·
@Birchlabs @LLMenjoyer The benchmarking script and output here verified gradient correctness github.com/catswe/flash-a…. I also did some private internal tests verifying output correctness. More thorough tests will be added in the future.
English
2
0
0
56
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer yeah it's orchestration level more than the raw ops; but it is important because that's double the layer routing overhead in practice and where the chunk of the slowdown is reported from if i remember right in the original paper (and why they do blocks vs all layers)
English
1
0
1
20
Will
Will@will_ea·
@slLuxia @LLMenjoyer The package exposes lower-level phase 1/phase 2 ops that users can use to compose the routing however they want. I believe what you’re referring to is the experimental API/example, which is still rough around the edges and mainly meant for research/prototyping.
English
1
0
2
24
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer hmm; it seems like you drop some routing; in the original paper they do 2x per block for pre-MLP & pre-attention but your repo does each block as a single layer. can see in fig 2 and section 2 of the moonshot paper. is this intended?
English
1
0
1
45
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer should be agnostic to block size/allow arbitrary blocks yeah? i find maintaining predicted interlayer circuitry gives the best perf for attnres
English
2
0
1
610
Will
Will@will_ea·
@ricklamers indeed. im personally very excited to contribute to the ai infra space
English
0
0
4
45
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
337
2.1K
13.6K
5M
Will
Will@will_ea·
@LLMenjoyer this is actually pretty fire now that i think more about it...
English
0
0
3
66
Will
Will@will_ea·
@MainzOnX @A_K_Nain i love writing kernels... unless i'm racing against a deadline
English
2
0
1
25
Adam Mainz
Adam Mainz@MainzOnX·
@A_K_Nain Yeah no one tells you that it’s actually not the fun part 😂 I’m writing backwards ops for 25% of my time last week and probably next week it’s no fun
English
3
0
16
865
Aakash Kumar Nain
Aakash Kumar Nain@A_K_Nain·
I hate writing kernels. We shouldn't be doing this at least frequently. I hope the next gen compilers are much better and smart.
English
5
0
42
3.6K
Will
Will@will_ea·
@blelbach I spent a couple weeks just writing kernels
English
0
0
0
62
Bryce, the CUDA Colonel
I have not worked hours this long since the start of my career. So much to do these days. So many possibilities that are now unlocked.
English
4
0
47
4.1K
Will
Will@will_ea·
@henrylhtsang I mean you need a decent amount of prerequisites knowledge to start with any substantial kernel work. The actual kernels may not seem like much but to get to that point requires narrowing down a bunch of ways to do it optimally that having experience would help.
English
0
0
0
191
henry tsang
henry tsang@henrylhtsang·
one impression of working a new grad: they can seem... slow? like slow to finish a task and turn around in general is that what my TL thought of me when I first started
English
4
0
18
2.5K
rita kozlov 🐀
rita kozlov 🐀@ritakozlov·
i've picked up the pen so many times to write about being a woman in tech and every time i chicken out because there's this catch-22: to talk about being a woman in tech, you need to have credibility. and once you start talking about it as a woman, you lose said credibility so i'm going to mortgage some of my credibility to get this off my chest, as someone who has both had a pretty successful career in tech, and leads a team with a lot of women on it: every woman you work with has had the most insane shit happen to her — on an almost daily basis. shit that makes you look at the camera and go "how did i end up here". from wild remarks about appearance to stalking and trauma dumping, and just constant dismissal from so many directions (employees, customers...). shit that you never tell anyone because they wouldn't believe you... i recently learned that like 97% of my followers on here are men. so my challenge to you is just to sit with that for a moment. you don't need to do anything about it (other than try not to be that person). but you should be aware that that's what every woman you work with deals with
English
61
124
1.2K
75K
Will
Will@will_ea·
@valigo May as well use Claude/GPT/Gemini to do the resume screening for you. Insane world.
English
0
0
0
43
Valentin Ignatev
Valentin Ignatev@valigo·
If you still don't hate recruiters enough, or if you are still not convinced that hr is the most incompetent, harmful, and entitled industry in tech, just watch this video. But TLDR is "I have 20 seconds to find 5 keywords in your CV, but nobody taught me how to use Ctrl+F"
Valentin Ignatev tweet media
English
57
98
2.4K
115K
Will
Will@will_ea·
GPT 5.5 is goated at kernels. Far better than Opus 4.7. The AI-generated kernels thesis is increasingly looking viable. Btw, I love what you guys are building @makora_ai
English
0
0
4
173
Will
Will@will_ea·
@LLMenjoyer i aspire to be a memer just like you
English
0
0
2
786
llm_enjoyer
llm_enjoyer@LLMenjoyer·
reading the deepseek v4 report be like
English
11
60
1K
75.6K
Will
Will@will_ea·
@samsja19 I admire to do pretraining one day
English
0
0
0
24
samsja
samsja@samsja19·
crazy how cracked cat are at distributed training
samsja tweet mediasamsja tweet media
English
18
7
302
15.7K