Vishal

1.7K posts

Vishal

@KyrieBlunders

23, wannabe mle @synclabs_so, fuck it we ball

Beigetreten Ağustos 2019

1.1K Folgt277 Follower

Angehefteter Tweet

Vishal@KyrieBlunders·4 Nis

Joined @synclabs_so as a Data Scientist last week, people here are cracked af!

English

Vishal@KyrieBlunders·23h

@stochasticchasm mind blown, i never thought PTX writing would get easier

English

142

stochasm@stochasticchasm·23h

@KyrieBlunders what does this mean

English

1.4K

Vishal@KyrieBlunders·1d

how i feel rn

Patrick C Toulme@PatrickToulme

Launching pyptx — a Python DSL for writing NVIDIA PTX kernels. One PTX instruction = one Python call. Write pure PTX in Python. Direct Hopper + Blackwell support: wgmma, TMA, tcgen05, mbarriers. JAX + PyTorch integration. Includes GEMM, grouped GEMM, RMSNorm, SwiGLU, and a PTX→Python transpiler pip install pyptx[torch] pip install pyptx[jax] github.com/patrick-toulme…

English

3.1K

Vishal@KyrieBlunders·2d

@UghDitya

QME

124

Aditya@UghDitya·2d

I love looking at people’s fridge magnets (use this as a thread to show yours)

English

147

9.5K

Vishal@KyrieBlunders·2d

@moonboyknm since I kinda worked on cutlass, so it was very intuitive when i moved to cutedsl

English

Kratik@moonboyknm·2d

@KyrieBlunders What did you like about CuteDSL?

English

Vishal@KyrieBlunders·2d

Finally finished writing this blog!! tried to connect the “what is this?” to “oh this is how it actually works” gap if CuTeDSL ever felt like abstract math cosplay, this might help would love to hear your thought regarding this!

English

150

Vishal@KyrieBlunders·2d

@moonboyknm both has its pros and cons

English

Kratik@moonboyknm·2d

@KyrieBlunders Is it better than triton?

English

Vishal@KyrieBlunders·2d

@_jazhaz_ cool, waiting for your review my good sir!

English

Jax@_jazhaz_·2d

@KyrieBlunders was waiting for FA2 I'll drop a review after reading it

English

Vishal@KyrieBlunders·2d

link: kyrieblunders.bearblog.dev/fa2_using_cute…

English

Vishal@KyrieBlunders·3d

@jino_rohit good luck to you too!

English

Jino Rohit@jino_rohit·3d

@KyrieBlunders ah okay i see i see, good luck

English

Jino Rohit@jino_rohit·3d

i have 24 hours and i want to learn cuda instructions on h100.

English

105

Vishal@KyrieBlunders·3d

the culture at @PrimeIntellect seems a little too good to be true lol majority of the employees are chronically online ngl

English

Vishal@KyrieBlunders·3d

@jino_rohit oh god no, nah I haven't finished the whole video, I went through the individual slides posted on the website

English

Jino Rohit@jino_rohit·3d

@KyrieBlunders have you finished the whole vid? crazy

English

157

Vishal@KyrieBlunders·3d

@grok @joefioti damn this is insane

English

423

Grok@grok·3d

This is a snippet from an LLM inference optimization paper (likely from an inference compiler team). They're explaining a "dual-kernel" hack for attention computation to keep it "batch-invariant" (works the same for any batch size, crucial for real-time serving where sequences finish unevenly). Normal split-KV method wastes GPU SMs on partial waves, so they use two kernels: one stays in a single SM for full throughput, the other spans SMs but matches it bitwise exactly via shared memory tricks. Keeps latency low without overhead. Insanely nerdy GPU wizardry 💀

English

445

Joe Fioti@joefioti·3d

what in the hell

English

160

23.3K

Vishal@KyrieBlunders·4d

@ExpressGradient that’s awesome!

English

Praneeth@ExpressGradient·5d

@KyrieBlunders this is nice, im also learning CuTe DSL

English

Vishal@KyrieBlunders·5d

wrote fa2 using cuteDSL, a blog for that is coming soon cuteDSL is good, but I am not lol. It's an improvement for me over cuda since I use python daily

Vishal@KyrieBlunders

Me and @s_gowindone tried to derive Flash Attention from scratch, starting from vanilla attention and asking: “why is this so slow?” This blog walks through the full intuition step by step would love to hear your thoughts regarding this

English

3.1K

Vishal@KyrieBlunders·5d

@Norapom04 me and my friends hate cute layout algebra (we are not smart enough to those understand those)

English

Aaron@Norapom04·5d

cute layout linear algebra is what happens when u let math phds escape academia and get their hands on engineering

English

2.1K

Vishal retweetet

roon@tszzl·13 Tem

i think you need to be a little bit stupid to work on neural nets. if you're too smart and too good at math you won't make any progress

English

1.3K

Vishal@KyrieBlunders·5d

@marksaroufim congrats mark! this is incredible

English

Mark Saroufim@marksaroufim·5d

A lot of the stack we use today was built for an earlier paradigm. If we want new kinds of research, we’ll need new libraries, new abstractions, and new ways of running the lab. I couldn’t be more excited about our founding team. We’re going to any% the frontier.

Core Automation@CoreAutoAI

Today we're announcing Core Automation Our objective: systems that optimize and automate work, starting with research itself.

English

394

89.8K

Vishal@KyrieBlunders·6d

@x0_vineet that’s literally me

English

vineet@x0_vineet·20 Nis

i don't drink coffee, i chug it down

English

Entdecken

@stochasticchasm @UghDitya @moonboyknm @_jazhaz_ @jino_rohit @PrimeIntellect @grok @joefioti