shikhar

6.4K posts

shikhar banner
shikhar

shikhar

@encapsulated007

squeezing FLOPS

0xCAFFEINE Katılım Kasım 2020
667 Takip Edilen1.4K Takipçiler
tender
tender@tenderizzation·
successor to the attention block needs to be called the introspection block atp
English
2
0
7
734
shikhar
shikhar@encapsulated007·
@MainzOnX very hard to deny one and accept the other. excited for both
English
0
0
1
33
Adam Mainz
Adam Mainz@MainzOnX·
@encapsulated007 We probably did and I can happily write something up here! You into tips and tricks on the DSL or optimization tooling? (Or both)
English
1
0
2
160
Adam Mainz
Adam Mainz@MainzOnX·
Thinking about writing blog posts / articles here again. Any topics people want? ML inference, kernel perf, cool projects from Meta etc?
English
18
6
92
9.2K
shikhar retweetledi
Albert Gu
Albert Gu@_albertgu·
The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
Albert Gu tweet media
English
36
314
1.6K
418.2K
shikhar retweetledi
nor
nor@norxornor·
everyone's openreview profile seems to have been switched out with some random guy's? (logged in a few minutes ago to see my profile basically wiped and replaced) @openreviewnet
nor tweet media
English
1
0
7
490
shikhar
shikhar@encapsulated007·
Ali Behrouz@behrouz_ali

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach.

English
1
0
1
96
shikhar retweetledi
Lei Zhang
Lei Zhang@LeiLMx·
I published a new post in my Triton series about Gluon — a new Python frontend that exposes more compiler internals so developers can have explicit control over performance. I also share some thoughts in the context of rapidly evolving agentic software development: portability vs performance, general vs domain-specific compilers, and why DSLs may become an important companion. 🔗 lei.chat/posts/gluon-ex…
English
2
19
140
13.7K
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
What’s the best model you can train in a day if someone hands you a pile of Blackwell GPUs? You can try out yourself On April 9 in Paris, @GPU_MODE + @verdacloud + @sestercegroup are hosting a GPU hackathon with a bunch of GPUs to run on and even more of them for the winners.
English
12
8
160
8.3K
shikhar
shikhar@encapsulated007·
@xidulu nemotron-3 inspired latent router?!
English
0
0
0
180
Xidulu
Xidulu@xidulu·
Has anyone tried using random projection as the MoE router...?
English
2
0
6
2K
shikhar
shikhar@encapsulated007·
@Laz4rz i just go to sleep everytime i build vllm from source (nvcc_threads=1, max_jobs=2)
English
0
0
2
46
PolyMage Labs
PolyMage Labs@polymagelabs·
Finally, here's the paper on PolyBlocks describing how fully code-generating compilers for AI chips can be built! This is the culmination of multiple years of R&D and engineering. There is now enough reusable infrastructure in our toolkit to quickly build high-performing PyTorch/JAX compilers for new chips, no matter how weird or unique their capabilities are, and without relying on any "kernel" libraries or manual model optimization or porting. The paper isn't exhaustive, but it provides details on the key parts, the design choices, and why they are powerful. arxiv.org/abs/2603.06731
English
2
18
85
5.5K
shikhar
shikhar@encapsulated007·
@polymagelabs this is really cool. didn't knew you guys were still out there buliding.
English
0
0
1
166
Zach Mueller
Zach Mueller@TheZachMueller·
Computers are fun again
Zach Mueller tweet media
English
3
0
10
722
shikhar
shikhar@encapsulated007·
what even in the good lord Hopper's hack was this!?
Mark Saroufim@marksaroufim

@m_sirovatka There's one smart human Erik Schultheis, he's the vanguard of humans against the AI slop and he's been working on a benchmark function that would be resistant to adversarial attacks If you're an AI researcher, come at us! github.com/gpu-mode/pygpu…

English
0
0
1
268