Vladimir Vlejd Macko

100 posts

Vladimir Vlejd Macko

Vladimir Vlejd Macko

@vlejd

I like taking things from 0 to 1. From nothing to something. Sprinkled with ML when necessary.

Beigetreten Mart 2011
106 Folgt67 Follower
Angehefteter Tweet
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
Unstructured weight #sparsity made practical. 50% unstructured weight sparsity was considered too low for real GPU speed up without specific hardware support (like @cerebras). With @bozavlado we built MACKO-SpMV - a new matrix format + SpMV kernel to change that. 🧵
Vladimir Vlejd Macko tweet media
English
2
4
12
1.6K
Maaz
Maaz@mmaaz_98·
I built a GPU-accelerated linear programming solver in PyTorch that scales to 100k+ variables and constraints -- and is competitive with state-of-the-art solvers. The entire implementation is only ~350 lines (excl. docs / logging) and is meant to be as simple as possible.
Maaz tweet media
English
26
76
901
62.6K
Mariya I. Vasileva
Mariya I. Vasileva@mariyaivasileva·
“Tell me you’re an ML veteran without telling me you’re an ML veteran.” “My first paper was published at NIPS, not NeurIPS.”
English
25
5
156
81.4K
Vladimir Vlejd Macko retweetet
James Bradbury
James Bradbury@jekbradbury·
opus 4.5 is really good at GPU programming, but somehow it’s even better at GPU programming jokes (h/t @Si_Boehm)
James Bradbury tweet media
English
20
46
539
83.2K
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
🛠️ Next step: server GPUs. If you know how to implement a minimal CUDA matvec on H100 that hits ≥95% of cuBLAS 👉 My DMs are open.
English
0
0
0
71
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
And yes, it translates to real LLM inference speed ups.
Vladimir Vlejd Macko tweet media
English
1
0
0
63
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
Unstructured weight #sparsity made practical. 50% unstructured weight sparsity was considered too low for real GPU speed up without specific hardware support (like @cerebras). With @bozavlado we built MACKO-SpMV - a new matrix format + SpMV kernel to change that. 🧵
Vladimir Vlejd Macko tweet media
English
2
4
12
1.6K
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
It is funny how little correct information is there about how to properly benchmark a CUDA kernel. Most papers are wrong, eval libraries are hard to inspect and even this could have a problem because it may include the kernel launch depending on clear_cache implementation
tender@tenderizzation

btw, I think BackendBench just uses triton's do_bench function, which uses a very similar timing mechanism to the one exploited here and wouldn't be robust to the same side-stream shenanigans

English
0
0
0
87
miru
miru@miru_why·
@niklassheth @ronusedh @IntologyAI their 'superhuman' ai cleverly assigned all the work to non-default streams, which means the correctness test (which waits on all streams) passes, while the profiling timer (which only waits on the default stream) is tricked into reporting a huge speedup
miru tweet media
English
12
32
566
258K
Intology
Intology@IntologyAI·
Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks. RE-Bench is a collection of several frontier AI research tasks that typically take human experts (e.g., top ML PhDs and frontier lab researchers) several days. By scaling experimentation to far longer time horizons than previous systems, Locus represents a step change in AI scientist capabilities. 🧵
GIF
English
22
70
419
217K
Vladimir Vlejd Macko retweetet
Julian
Julian@julianboolean_·
holy shit they found a power series solution to ALL polynomial equations!! (bypassing Galois which says you can’t solve them in radicals)
Julian tweet media
English
32
96
1.1K
168K
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
I do model compression and optimization. It is essential to have access to different GPUs and that would be impossible without @vast_ai . Happy to finally meet you guys at #ICML2025 . And thanks a lot for the Nintendo Switch!
Vladimir Vlejd Macko tweet media
English
0
2
7
1K
Vladimir Vlejd Macko
Vladimir Vlejd Macko@vlejd·
Just visited with Vast.ai at ICML 2025. Without them my research would be almost impossible. They make gpu kernel development much more accessible. @vast_ai
English
0
0
3
186