Vishal

1.7K posts

Vishal banner
Vishal

Vishal

@KyrieBlunders

23, wannabe mle @synclabs_so, fuck it we ball

Beigetreten Ağustos 2019
1.1K Folgt277 Follower
Angehefteter Tweet
Vishal
Vishal@KyrieBlunders·
Joined @synclabs_so as a Data Scientist last week, people here are cracked af!
Vishal tweet media
English
11
0
79
6K
Vishal
Vishal@KyrieBlunders·
@stochasticchasm mind blown, i never thought PTX writing would get easier
English
1
0
1
142
Aditya
Aditya@UghDitya·
I love looking at people’s fridge magnets (use this as a thread to show yours)
Aditya tweet media
English
19
0
147
9.5K
Vishal
Vishal@KyrieBlunders·
@moonboyknm since I kinda worked on cutlass, so it was very intuitive when i moved to cutedsl
English
1
0
1
8
Vishal
Vishal@KyrieBlunders·
Finally finished writing this blog!! tried to connect the “what is this?” to “oh this is how it actually works” gap if CuTeDSL ever felt like abstract math cosplay, this might help would love to hear your thought regarding this!
Vishal tweet media
English
3
3
12
150
Vishal
Vishal@KyrieBlunders·
@_jazhaz_ cool, waiting for your review my good sir!
English
0
0
1
11
Jax
Jax@_jazhaz_·
@KyrieBlunders was waiting for FA2 I'll drop a review after reading it
English
1
0
2
16
Jino Rohit
Jino Rohit@jino_rohit·
i have 24 hours and i want to learn cuda instructions on h100.
Jino Rohit tweet media
English
4
4
105
4K
Vishal
Vishal@KyrieBlunders·
the culture at @PrimeIntellect seems a little too good to be true lol majority of the employees are chronically online ngl
Vishal tweet media
English
1
0
4
88
Vishal
Vishal@KyrieBlunders·
@jino_rohit oh god no, nah I haven't finished the whole video, I went through the individual slides posted on the website
English
1
0
2
78
Grok
Grok@grok·
This is a snippet from an LLM inference optimization paper (likely from an inference compiler team). They're explaining a "dual-kernel" hack for attention computation to keep it "batch-invariant" (works the same for any batch size, crucial for real-time serving where sequences finish unevenly). Normal split-KV method wastes GPU SMs on partial waves, so they use two kernels: one stays in a single SM for full throughput, the other spans SMs but matches it bitwise exactly via shared memory tricks. Keeps latency low without overhead. Insanely nerdy GPU wizardry 💀
English
1
0
3
445
Joe Fioti
Joe Fioti@joefioti·
what in the hell
Joe Fioti tweet media
English
5
6
160
23.3K
Vishal
Vishal@KyrieBlunders·
wrote fa2 using cuteDSL, a blog for that is coming soon cuteDSL is good, but I am not lol. It's an improvement for me over cuda since I use python daily
Vishal tweet media
Vishal@KyrieBlunders

Me and @s_gowindone tried to derive Flash Attention from scratch, starting from vanilla attention and asking: “why is this so slow?” This blog walks through the full intuition step by step would love to hear your thoughts regarding this

English
1
2
45
3.1K
Vishal
Vishal@KyrieBlunders·
@Norapom04 me and my friends hate cute layout algebra (we are not smart enough to those understand those)
English
0
0
2
75
Aaron
Aaron@Norapom04·
cute layout linear algebra is what happens when u let math phds escape academia and get their hands on engineering
English
5
0
36
2.1K
Vishal retweetet
roon
roon@tszzl·
i think you need to be a little bit stupid to work on neural nets. if you're too smart and too good at math you won't make any progress
English
80
59
1.3K
0
vineet
vineet@x0_vineet·
i don't drink coffee, i chug it down
English
2
0
6
95