Сижу дома, а хотелось бы с Майки retweetledi
Сижу дома, а хотелось бы с Майки
3.3K posts

Сижу дома, а хотелось бы с Майки
@OnMainP
This site is garbage and I'm walking away now
Мир чевапчичи Katılım Kasım 2017
430 Takip Edilen59 Takipçiler

When you run a @PyTorch model on a GPU, the acutal work is executed through kernels. These are low-level, hardware-specific functions designed for GPUs (or other accelerators).
If you profile a model, you'll see a sequence of kernel launches. Between these launches, the GPU can sit idle, waiting for the next operation. A key optimization goal is therefore to minimize gaps between kernel execution and keep the GPU fully utilized.
One common approach is `torch.compile`, which fuses multiple operations into fewer kernels, reducing overhead and improving utilization.
Another approach is to write custom kernels tailored to specific workfloads (e.g., optimized attention or fused ops). However, this comes with significant challenges:
> requires deep expertise in kernels writing
> installation hell
> integration with the model is non-trivial
To address this,@huggingface introduces the `kernels` library.
With this one can:
> build custom kernels (with the help of a template)
> upload them to the Hub (like models or datasets)
> integrate them to models with ease
Let's take a look at how the transformers team use the kernels library to integrate it into the already existing models. (more in the thread)
English

@HaochengXiUCB gotta love these "GPU breakthroughs" that only work in controlled Unix environments.
Reimagined k-means for modern hardware, but somehow Windows support was a step too far.
like 30x over cuML, 200x over FAISS and 0x on Windows
English

𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁.
That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks.
By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds.
A classic algorithm — redesigned for modern GPUs.
Paper: arxiv.org/abs/2603.09229
Code: github.com/svg-project/fl…
English
Сижу дома, а хотелось бы с Майки retweetledi

"Censored" and it's just a decrease in women's sexualization by 1%
DomTheBomb@DomTheBomb
Nintendo censored the cover art for Dispatch 😭💀
English

@ProtonDrive @vivaldibrowser @zen_browser @brave Brave буквально спонсируется челом, который платил компаниям за дискредитацию LGBTQ+ движений... Вы хоть проверяйте репутацию тех кого рекламируете
Русский

@pixel_updates Deprecation of Play Integrity API and adoption of Hardware Attestation API instead.
Simultaneous release to AOSP without bugs or delays.
Quick Share interoperability integrated into AOSP.
English

@menofatlus Does the link to the 2025 archive work? Something throws me from any post to page 400...

English

@geppei5959 Is there a version without dialogs on Patreon?
English

Unless you're writing .NET, why would you choose a Thinkpad over a Mac in 2025
fidexCode@fidexcode
Let's end this debate Thinkpad or Macbook
English

@goonerismEXP Why is it always so short? Six seconds feels like nothing 😩
English

@Fair_Universe_ Очень даже блядь моё, знаешь ли.
Но ничего, в Ульяновке тишь до гладь, так что теперь чилл
Русский

@OnMainP @serpentsmanager @SobolLubov Он же в тебя и твой дом не ебанул. А куда он летел это не ваше собачье дело
Русский

@bukbrid Нахуй мне это делать? Я чтоль их стирал? Может войну я начал???
Мозги напрячь не желаешь?
Русский

@OnMainP @serpentsmanager @SobolLubov Сколько домов в рф разрушено за 3.5 года полномасштабной войны?
Сколько городов рф буквально стёрто с лица земли?
Сравнить масштабы не желаешь?
Русский

























