Seva Brekelov #stopwar

3.4K posts

Seva Brekelov #stopwar

Seva Brekelov #stopwar

@brekelov

👨‍💻 Software engineer @mongodb 📖 https://t.co/nRqu5LwjeG ♂️ He/him

Amsterdam, The Netherlands Katılım Kasım 2011
821 Takip Edilen835 Takipçiler
Seva Brekelov #stopwar retweetledi
Anton Arhipov
Anton Arhipov@antonarhipov·
Help me to choose an SDD toolkit for agents. What's your preference? spec-kit vs openspec vs agent-os vs BMAD (or should I roll my own? 😆 - reply in comments)
English
2
3
8
4.6K
Seva Brekelov #stopwar
Seva Brekelov #stopwar@brekelov·
Incredible work
Aleksa Gordić (水平问题)@gordic_aleksa

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute both during training and inference. Most of NVIDIA compute is spent on it. Gaining 1% in efficiency translates to massive savings in the order of many nuclear reactors :P) I, yet again, realized i underestimated the effort. 😅 Here is one more booklet (lol). 47 figures! I covered: * The fundamentals of the GPU architecture with an emphasis on the memory hierarchy, building mental models for GMEM, SMEM, and L1/L2, and then connecting them to the CUDA programming model. Along the way we also looked at the "speed of light," how it's bounded by power, with hardware reality leaking into our model. * PTX/SASS, and how to steer the compiler into generating what we actually want (is that loop being unrolled, are we using vectorized loads like LDG.128, etc.). I've annotated one PTX/SASS example for a simple matmul kernel in excruciating detail. Even if you're new to compilers you should find this useful. (i actually found various inefficiencies in both compilers - fun!) * Many core concepts such as tile/wave quantization, occupancy, ILP (instruction-level parallelism), roofline model, etc. Also building intuition around fundamental equivalences: dot product as a sum of partial outer products, why square tiles are the right shape for high arithmetic intensity, etc. * The warp tiling method - which is near SOTA assuming you can't use tensor cores, TMA, async mem instructions, and bf16. Just maximizing GPU's performance using nothing but CUDA cores, registers and shared memory. * Finally, we step into Hopper (H100): TMA, swizzling, tensor cores and the wgmma instruction, async load/store pipelines, scheduling policies like Hilbert curves, clusters with TMA multicast, faster PTX barriers, and more. As always lots of examples, lots of visuals. This is the first time i could see warp tiling kernel and be like "oh i get it completely". I just needed my mental image transformed into an actual image. A few years ago I was really inspired by @Si_Boehm's excellent blog post on how matmul works, but I also found it had several errors, some unclear explanations, and it was quite outdated. Building on @pranjalssh amazing work (who did a great job building sota kernels for H100) and my own research, this is the final result. --- Again a huge thank you to @Hyperstackcloud (GPU cloud) for giving me an H100 (PCIe) node to run some of the experiments and analysis that i needed to write this up. Also a big thank you to my friends Aroun (who did a very thorough review of the post; Aroun's doing cool GPU/AI stuff at Magic and was previously GPU architect at Apple and Imagine, he's one of the best GPU people i know and we worked together on llm.c w/ @karpathy) and the amazing @marksaroufim! (PyTorch) for taking the time during weekend when they didn't have to. :)

English
0
0
0
91
Vladimir Ivanov
Vladimir Ivanov@vvsevolodovich·
Where to go with a kid in Amsterdam?
English
1
0
0
580
Josh Long
Josh Long@starbuxman·
Hello, Amsterdam !
Nederlands
2
0
16
2.6K
Seva Brekelov #stopwar
Seva Brekelov #stopwar@brekelov·
@_bravit @tagir_valeev Утрехт топ Если будет в след раз время — можем пересечься, я тут живу и тут классно
Русский
0
0
1
42
Виталий Брагилевский
АААААААААААА, мне велосипедисты только что уступили дорогу! Я не сплю?
Русский
4
0
40
2K
Vladimir Ivanov
Vladimir Ivanov@vvsevolodovich·
Finally I can say that my time at Bolt is coming to the end. Transitioning to being a CTO and co-founder at getsupplied.ai. I am going to share our journey in bringing compliant and perofrmant supply to digital platforms ;)
English
4
0
25
1.3K
Kerrigan ☆ 케리건
Kerrigan ☆ 케리건@Kerry_Shark·
Каждый раз открывая Miro впоминаю, как однажды пару лет назад я просто написала в тви, что миро не оч хорошо у нас работает вместе с Google Meet, и @brekelov пришел, позадавал вопросы и быстренько всё пофиксили 🥰 было суперско!
Cash Cleaner Simulator@CashCleanerSim

This design process spanned countries, months, and time zones We sketched, argued, tested, iterated - all inside one big messy Miro board! @MiroHQ, massive thanks! Your platform made remote teamwork actually feel fun 😍 [6/7]

Русский
1
3
18
1.7K
Tagir Valeev
Tagir Valeev@tagir_valeev·
@aarexer У меня нет бонусных карт. Один геморрой с ними.
Русский
1
0
3
1.6K
Александр Кучук
Бонусные карты магазинов у меня просто в галерее лежат. И вот стою я на кассе, у меня спрашивают бонусную карту. Я говорю да да, сейчас и в галерее открываю, показываю. Она смотрит на экран и говорит за такое мы бонусы не даем. Я такой ну ок. Потом смотрю что я показал:
Александр Кучук tweet media
Русский
15
36
2.8K
73.8K
Seva Brekelov #stopwar
Seva Brekelov #stopwar@brekelov·
@brunoborges Couldn’t believe in such results like 3 months ago. And here we are. Btw love vibe coding: was able to get some things done which were in my life backlog for years :)
English
0
0
1
31
Bruno Borges
Bruno Borges@brunoborges·
@brekelov The prompt is good enough to get started. LLM was able to give me recommendations to enhance the JVM of the VS Coded Lang Server. The MCP code needs some safe guards, but again, this was the outcome of vibe coding for a few minutes 😂
English
1
0
2
52
Bruno Borges
Bruno Borges@brunoborges·
@brekelov It is a start, but a long way from reality, indeed.
English
1
0
1
25
Alexander Granin
Alexander Granin@graninas·
I wasted 10 years on C++. Then I wasted 10 years on Haskell. What should I waste my time on next?
English
864
45
2.3K
211.4K