Dawid Rutkowski

1K posts

Dawid Rutkowski

@calcarinus

MD and PhD student in artificial intelligence applied to medicine

Beigetreten Kasım 2022

271 Folgt36 Follower

Dawid Rutkowski retweetet

Dispropaganda@Dispropoganda·2h

Relax, no one is telling you which "heroes" you can or can't honor. You can honor UPA OUN Melnyk Bandera Shukhevych Klyachkivsky and any other genocidal mass murdering Nazi collaborators you worship. Just don't be surprised when normal people treat you as they treat other countries which honor genocidal mass murdering Nazi collaborators, like Russia.

Kyrylo Budanov@Kyrylo_Budanov

No one will ever again dictate to Ukrainians which heroes to honor, which holidays to celebrate, or which history to study. Our ancestors fought for centuries for this right to freedom of choice and national independence, and it is for this very right that our warriors are shedding their blood today. Their memory must live on forever. We, our children, and all future generations must clearly know and honor those who dedicated their lives to the struggle for Ukraine. This is especially important today, as we are fighting a war for Ukraine’s freedom. President Volodymyr Zelenskyy has submitted to the Verkhovna Rada the draft law “On the Ukrainian National Pantheon.” This is a decisive and much-needed step toward restoring historical justice, consolidating our society, and laying the foundation of memory for future generations. This document is a testament to our maturity as a state. I am convinced that the Ukrainian Parliament understands the importance of this initiative and will consider and adopt this draft law as soon as possible. Glory to Ukraine!

English

349

7.1K

Dawid Rutkowski retweetet

☔@Whotfismick·16h

when you finally do your laundry and all your top-tiers clothes are back:

English

130

18.3K

154.9K

1.3M

Dawid Rutkowski retweetet

Conejo Rojo@elconejorojo·19h

En el próximo Mundial participarán 64 selecciones y Argentina jugará con El Vaticano, Disneylandia y Narnia.

Español

808

9.8K

79.9K

816K

Dawid Rutkowski retweetet

Avi Chawla@_avichawla·1d

A tricky LLM interview question: You're serving a reasoning model on vLLM, and it keeps running out of GPU memory on long traces. So you add KV cache compression and evict 90% of the cached tokens. VRAM usage stays as is and GPU still runs out of memory. Why? (answer below) Evicting 90% of the KV cache can free almost none of the memory it was using. This sounds counterintuitive, but it follows directly from how production servers store the cache today. The KV cache grows with every token a model generates. Each token appends its key and value vectors across every layer, and nothing is freed while generation continues. This is the dominant memory cost for reasoning models. If a 32K-token CoT caches ~32K tokens of KV vectors, a Qwen3-32B with 4-bit weights will run out-of-memory around 24K tokens on a 24GB GPU. One obvious solution is to keep the important tokens and drop the rest, since attention is sparse enough to allow it. But this does not solve the memory problem yet. The reason is paged attention, which is the memory manager behind vLLM and most production servers. Under the hood, it splits GPU memory into fixed physical blocks, each one holds the KV for about 16 tokens. This block returns to the allocator only when every slot inside it is empty. Since the eviction logic selects tokens by importance, and such tokens are scattered across blocks... ...so despite eviction, almost every block is left with at least some survivor tokens. For instance, if the logic evicts 14k of 16k tokens across 1,000 blocks, most likely every block will still have a token. This means the allocator frees almost nothing. Placing the new tokens into those freed slots is not ideal because it breaks the cache's layout. Say token 16,001 arrives, and it's placed in the slot the 40th token used to hold. The cache now reads position 38, then 16,001, then 41, so the cache is no longer in token order. Attention can still compute the right answer from that, but only if every slot now carries a separate note recording which position it actually holds. This introduces another bookkeeping cost that an in-order layout inherently avoids. So the cache is logically 90% smaller and still physically the same size. Many compression results miss this because they measure on pre-allocated contiguous tensors rather than a paged server. There's another problem. Eviction methods pick which tokens to keep by looking at the attention scores themselves (as expected). But fast attention kernels used in production, like FlashAttention, never save those scores. They compute attention in small pieces and throw the full score grid away as they go, which is also why they're fast. So the exact signal eviction methods need isn't available in memory. The workaround is to fall back to eager attention and build the full matrix, which gives up the speed FlashAttention was there to provide. NVIDIA published a method called TriAttention to solve both these problems. It never needs attention scores. Instead, it scores tokens from the geometry of the model's key and query vectors before RoPE is applied, where those vectors sit in stable clusters. For the memory problem, it runs a compaction pass every 128 decoded tokens. The surviving tokens slide forward to close the holes eviction creates, so whole blocks empty out and return to the allocator while the cache stays in token order. On long reasoning traces, the approach matches full-attention accuracy while decoding 2.5x faster and using 10.7x less KV memory. KV cache compression is a big infrastructure problem. The number that decides whether it works is the count of freed blocks, not the count of evicted tokens. You can find the NVIDIA write-up here: research.nvidia.com/labs/eai/blogs… I wrote a first-principles breakdown of how the KV cache works. It walks through why the model stores keys and values at all, why the cache grows with every token, and a comparison of LLM generation speed with and without KV caching. Read it below.

Avi Chawla@_avichawla

x.com/i/article/2034…

English

263

1.9K

226.9K

Dawid Rutkowski@calcarinus·5h

@OliverMolander Agreed!

English

241

Oliver Molander@OliverMolander·13h

Idea: Let's create a world-leading Nordic AI research lab. There's an insane amount of top AI research talent from the Nordics. I have many friends who work or have worked in top positions at e.g. Deepmind, Anthropic, OpenAI. No region can match the Nordics in trust.

English

403

20K

Dawid Rutkowski@calcarinus·21h

@lauriewired Why didn’t you benchmark against a proper setup? (e.g. Nvidia GPU and a CRT monitor? Or RTX GPU with 720hz monitor)

English

878

LaurieWired@lauriewired·1d

you’ll get mad at me for saying this…but cloud gaming is so obviously more economically efficient than physical hardware I think it’s going to be the default soon. your home console / pc is idle 90%+ of the day. meanwhile, data centers targets what, 5%, maybe at worst 10% idle. every second a cloud gamer isn’t gaming, that hardware is being used for someone else, training, etc. I think there should be a new measurement, something like cost-per effective FLOP hour that takes into account the TCO + effective utilization. If a gamer spends $500 on a GPU, uses it for 3 years, but it’s only fully active ~5% of that period…the cost-per relative FLOP hour is crazy high! Meanwhile, a $50,000 datacenter GPU might have a *LOWER* cost-per FLOP hour just because the effective utilization is 90+%.

English

2.8K

146

3.7K

2.4M

Dawid Rutkowski@calcarinus·1d

Sweden has insane innovative capabilities but the lead this beautiful country has had after WW2 (attributed largely to a combination of unharmed industry and great talent) has diminished greatly. The reason Poland stands out is because they received a lot of support from the European Union (still not on the same level as the Marshall Plan) and made sure not to squander this. They essentially decided to do the same as South Korea and Singapore - i.e. drop communism altogether.

English

gabriel@gabriel1·1d

when i grew up sweden and the US had same gdp per capita, and now american gdp has doubled while sweden stayed flat and no one is aware and most rest of europe is doing even worse. except poland they're doing great somehow

English

273

11.7K

gabriel@gabriel1·1d

america banning ai models internationally making everyone else 40% less economically productive while the EU is still debating if DALLE-2 is ISO 335 compliant at this point im not surprised if USA would 10x gdp without EU noticing

English

142

4.2K

147.7K

Dawid Rutkowski retweetet

Naithan Jones@NaithanJones·1d

“THeY cAnT bAn OpEn sOuRcE bRO hOw cAn tHeY bAn dOwnLoAdINg” Well let me explain how this goes - IP block the websites of the OS model providers - If an American citizen do an FBI site seizure - Tell all common repository services to pull down OS model repos (a DMCA style system) and report then ban the builder - subpoena the logs for any IP addresses that downloaded the models and prosecute in a heavy handed way to make a public example (Google Aaron Schwartz) - with NVIDIA et al and any GPU manufacturer to implement a KYC registry for any compute purchases over a certain threshold You are either in denial or have limited understanding of historical context if you think this isn’t where we are headed within 3 years

English

253

105

970

105.5K

Dawid Rutkowski@calcarinus·1d

@TheAhmadOsman This text is incredible underrated and ahead of its time.

English

Ahmad@TheAhmadOsman·12 Haz

x.com/i/article/2065…

ZXX

117

372

1.9K

1.7M

Dawid Rutkowski retweetet

the tiny corp@__tinygrad__·1d

The US AI pay-to-play scam is so much more tolerable after switching to a locally hosted GLM-5.2. From the front page of HN, open weights will be the frontier this December. Sorry about your IPOs.

English

100

1.4K

57.4K

Dawid Rutkowski retweetet

Sławomir Dębski@SlawomirDebski·1d

This is unfortunately a highly skewed interpretation... Reducing the current Polish-Ukrainian crisis to MAGA, Trumpism, the far right or electoral politics misses the central issue. The real turning point was President Zelensky’s decision to name a Ukrainian military unit after the “Heroes of the UPA”. That single decision inflicted more political damage on support for Ukraine in Poland than anything Vladimir Putin had managed to achieve in four years of war. Nor is this only a Polish sensitivity. The European Parliament reached the same conclusion as early as 2010, when it deeply deplored President Viktor Yushchenko’s decision to award Stepan Bandera the title of “Hero of Ukraine”, explicitly recalling the OUN’s collaboration with Nazi Germany and expressing the hope that Ukraine would remain committed to European values. For Poles, the issue goes even further. The same political tradition is associated not only with collaboration with the Third Reich but also with the mass murder of tens of thousands of Polish civilians by the OUN-UPA. It is hardly surprising that many Poles saw Zelensky’s decision as incompatible with the spirit of the extraordinary solidarity Poland has shown Ukraine since 2022. What makes this particularly striking is that it was an entirely unforced error. For nearly three weeks, the Polish side sought a quiet solution. Multiple channels were used, including former President Aleksander Kwaśniewski - arguably the person who did more than anyone else to build modern Polish-Ukrainian reconciliation. Imagine Donald Trump asking Barack Obama to help defuse a diplomatic crisis. That is how unusual this effort was. Yet Kyiv chose confrontation over correction. Instead of quietly reversing an unforced error, it allowed it to escalate into the deepest political crisis in Polish-Ukrainian relations since Russia’s full-scale invasion. That is the story worth analysing - not MAGA.

The Bulwark@BulwarkOnline

"Poland is the conduit for a vast majority of Western weapons, trainees, and supplies that reach the Ukrainian front. Ukraine, meanwhile, is the army standing between Poland and the Russian border. Neither side can afford this quarrel." Dalibor Rohac's latest on the Polish President's fight with Zelensky and what that would mean for the future of this war. lnk.thebulwark.com/3QDvu2s

English

169

38.4K

Dawid Rutkowski retweetet

Alexander Doria@Dorialexander·1d

Alors deux minutes d’explications : 1. Les modèles à poids ouverts sont protégés par la norme safetensors : par définition ils n’embarquent pas de code. Jamais vu un cas de backdoor. 2. L’alignement est reprogrammable avec un peu de RL/SFT. Mais bon ça fait des années que j’encourage la fonction publique à développer leur capacités d’entraînement de modèle, sans résultat.

Alex Xplore@AlexXplore

🇫🇷🇨🇳 La Direction générale du Trésor française a testé le modèle d’IA chinois Qwen d’Alibaba dans son outil interne HéphAIstos. ⚠️ L’expérimentation a été interrompue dès le 23 juin 2026 en raison de réponses orientées ou biaisées sur des sujets sensibles liés à la Chine... (Quelle surprise ! 🤦) 🇫🇷 Le modèle chinois a été immédiatement remplacé par un modèle de la start-up française Mistral AI. 🤯 Pourquoi utiliser une IA étrangère dans un lieu aussi sensible ? lemonde.fr/pixels/article…

Français

548

75.5K

Dawid Rutkowski retweetet

Max Zanoga@zanoga·2d

x.com/i/article/2023…

ZXX

246

47.3K

Dawid Rutkowski@calcarinus·4d

So true..

English

Dawid Rutkowski@calcarinus·4d

@zanoga Genius!

Lietuvių

445

Max Zanoga@zanoga·4d

Finally finished building my AI datacenter! 🚀 32x3090s across 4 servers (8 GPUs each), all connected over InfiniBand. The whole setup is solar-powered with a massive battery bank and generator backup. More technical details and benchmarks coming soon.

English

588

410

6.4K

796.4K

Dawid Rutkowski@calcarinus·4d

@TheAhmadOsman What’s your experience combining GPUs of the same architecture but different VRAM sizes?

English

Ahmad@TheAhmadOsman·21 Haz

Why do I focus on Inference Engines/Software Stacks for your hardware? - 2x RTX 3090s: ~14.5 tok/s → ~64 tok/s moving to vLLM w/ TP=2 - RTX PRO 6000: ~32 tok/s → ~110 tok/s moving to Sglang So: - CUDA/2+ GPUs: ExLlamaV3/vLLM/Sglang > llama.cpp - Edge: llama.cpp > Ollama

Ahmad@TheAhmadOsman

x.com/i/article/2057…

English

289

25.1K

Dawid Rutkowski@calcarinus·4d

@Linahuaa Great display of tenacity. Nothing short of inspiring tbh.

English

206

LinaHua@Linahuaa·4d

How my parents went from dishwashers to millionaires in Germany >Came to Berlin in early 2000s >Did part-time jobs in Viet restaurants while studying >Learned how to cook all the Viet/Thai stuff, but even better (high IQ/mom has hustle mindset) >Became valued assets in the community because their cooking could carry random unnamed Viet-owned restaurants >Decided that grinding 80h in Viet restaurants as star players is better than German job or returning to China >Saved ultra aggressively. >Parked me in China with the grandparents for first 4 years >Told me to go out playing to save electricity >Wearing thick clothes indoors to save heating >Told me to come to their workplaces to eat (two different restaurants so decent variety) and do homework >Basically we were just at home to sleep and shower >Invested savings in real estate in China and Germany very aggressively (3-5x multiplicator on their savings) >Mom got obsessed with evaluating real estate and flipping >Pivoted into premium/luxury real estate agents selling houses to rich Viet restaurant owners >Makes $100K++ per sale >Started living more flashy lives to not lose face to rich Viets >Now spend about $200K per year just on groceries, golf, business class travel, clothes. >But they still make less money than me, thus consider me the head of the family.

English

191

21.8K

Dawid Rutkowski retweetet

IT Unprofessional@it_unprofession·5d

HR forced me to hire a junior systems administrator last week. He's 23 years old and showed up on day 1 carrying a physical notebook. He spent his first morning looking at our backend and realized my automation scripts were written in 2008. He asked me why we're running deprecated code that relies on an unpatched version of Windows 7. I told him we employ a strategy of chronological obfuscation. I explained that modern malware is designed to attack modern architecture. By keeping our infrastructure trapped in the Bush administration, we're immune to zero-day exploits. You can't hack what you can't interface with. He looked at me like I was insane and asked about data compliance. I leaned back in my chair and whispered the phrase "asynchronous legacy tunneling". He immediately closed his notebook and apologized for questioning my vision. I spent the rest of the afternoon watching a 4-hour documentary about the Roman Empire at my desk. Next week I'm going to make him untangle category 5 cables for character development.

English

437

1.6K

30.6K

1.6M

Dawid Rutkowski retweetet

John Carmack@ID_AA_Carmack·5d

If you are asking “Why push back against anti-datacenter efforts?” I consider it a tragedy that anti-nuclear efforts largely strangled nuclear power in the US based on vibes, and I don’t want to see that happen to AI. Public opinion matters, and it shouldn’t be ceded unchallenged. If you are asking “Why should I support AI efforts at all?” I believe we are in the midst of a transition more vibrant than the industrial revolution. Opinions formed a couple of years ago about the uselessness of AI are no longer valid. Millions of people and organizations are getting great returns from using it, and the demand for data centers is the market responding to the value signal. That is how progress is made!

English

419

1.4K

7.4K

859K

Dawid Rutkowski retweetet

Julia Turc@juliarturc·6d

This is what happens when you plug LLMs into voice assistants, instead of a decade of handwritten rules. This video dissects Voxtral (a family of OSS speech models) and the foundational work behind it (audio tokenization, semantic/acoustic disaggregation, etc). Thank you @MistralAI for your collaboration and for your detailed technical reports in an increasingly opaque industry! 00:00 Intro 01:03 Modular vs end-to-end speech models 03:30 Speech-to-Text 06:07 Delayed Streams Modeling (DSM) 09:41 Whisper Streaming 10:33 Voxtral Realtime 13:07 Voxtral Text-to-Speech 14:28 Throwback: WaveNet 15:24 Audio tokenization 20:39 The Voxtral Codec 21:49 Back to Voxtral TTS 25:30 Outro

English

1.2K

294.8K

Entdecken

@OliverMolander @lauriewired @TheAhmadOsman @zanoga @elonmusk @BarackObama @taylorswift13 @cristiano