sankalp

46K posts

sankalp

@dejavucoder

ai and side quests. i post well, just follow. i consult on ai engineering stuff | seeking post-training, auto-research adjacent, evals related work atm

bangalore, india Katılım Ekim 2021

682 Takip Edilen25.9K Takipçiler

Sabitlenmiş Tweet

sankalp@dejavucoder·6d

my latest blog post "auto-research with codex: how I achieved a 212x faster kernel over baseline with codex in GPU Mode's qr_v2 problem" is up now. in this post, i talk about my approach towards auto-kerneling on the QR decomposition problem. sankalp.bearblog.dev/autoresearch/

English

407

46.8K

sankalp@dejavucoder·8h

tibo, where is my reset?

Tibo@thsottiaux

Tomorrow might be 8M active user celebration day. Just saying

English

818

sankalp@dejavucoder·9h

new banger blog drop on gpu and tpu communication. exhaustive resources are scarce for this subject

Aleksa Gordić (水平问题)@gordic_aleksa

New in-depth blog post time: "Inside TPU and GPU Clusters: The Anatomy of Collective Communication". If you want to deeply understand the core primitives behind scaling the training / inference for MoEs and dense transformers, going a level below FSDP, expert parallelism, data parallelism, model/tensor parallelism this might be a fun read. I cover: * TPU cluster topology: (super)pods, slices, DCN, PCIe, ICI * All-Gather: 1D/2D rings, and path algo (lots of visuals so should be crystal clear how these work even if you're not a perf engineer) * Reduce-Scatter (which is the dual of AG) and All-Reduce * All-to-All (used to dispatch tokens to target experts in MoEs) * NVIDIA GPU cluster topology (reference DGX architecture): nodes, scalable units, fat tree * GPU collectives within the node: rings, trees (log2 steps), and SHARP (in network compute unit) * GPU collectives across nodes, hierarchical algorithms over InfiniBand etc. I was heavily inspired to do this deep dive after reading the excellent Scaling book by an excellent group of people @jacobaustin132 @_sholtodouglas @reinerpope and others! What originally started as "let me maybe just make four figures covering All-Gather, Reduce-Scatter, All-Reduce, and All-to-All so I can understand them better, it shouldn't take more than a day, right, right?" somehow turned into this 40 figures later. Along the way, I realized that the collective algorithms only really make sense once you understand the underlying hardware topology. TPUs were a bit easier to reason about, but I couldn't skip GPUs, I love them too much. Rings are cool, but I also wanted to understand tree algorithms. But also SHARP, and fat trees, and hierarchical collectives. :') So the scope slowly expanded, and little by little, this blog post came to fruition. Just a side-quest. Hope you like it! :) --- Also a big thank you to my friends for reviewing the blog and providing feedback: * @ArunDemeure (prev GPU/AI stuff at Magic, GPU architect at Apple and Imagine, my llm.c buddy!) * @axel_s_feldmann (making GPUs go brrr at Jane Street, we met for the first time at @marksaroufim's excellent GPU mode event) * @pranjalssh (ex xAI GPU wizard, one of two people who inspired my original matmul blog!)

English

1.7K

sankalp@dejavucoder·10h

its easy to hate the token rich do you have the courage to hate the token poor domain expert

tender@tenderizzation

can someone tell me what fraction of the gpu mode leaderboard is just people doing this with their agents

English

873

sankalp@dejavucoder·11h

i was thinking to ask maja few days back if she has written a post about her writing process (she mogs everyone not only in the thoughts and ideas but also in prose and simplicity) and here the post is now lol

maja 🔭🍒@majamediaco

new essay: the more you write, the more you begin to see i started writing personal essays last year because i was worried i had lost my ability to think deeply after years in a consuming corporate job. what i did not expect was that writing consistently would change my whole life, what i noticed, how ideas connected, and how i made sense of my experience this essay is about writing as a net for catching thoughts before they disappear, the abundance that appears when you follow an idea for long enough, and the tension between paying closer attention to your life and turning it all into material read here: open.substack.com/pub/velvetnois…

English

1.7K

sankalp@dejavucoder·11h

@ashebytes see this

English

sankalp@dejavucoder·11h

@ashebytes yes not joking😂

English

114

ashe@ashebytes·11h

@dejavucoder really?

English

302

sankalp@dejavucoder·11h

i wrote "just follow" in my bio and more people started following me. how do i keep forgetting this that you can just ask for things.

English

1.8K

sankalp@dejavucoder·11h

@marksaroufim @tenderizzation @gaunernst what are your feelings about this though

English

190

Mark Saroufim@marksaroufim·11h

@tenderizzation At least for the QR competition almost the entire leaderboard was basically this BUT the best submission was the one by @gaunernst (who I believed used AI for the first time in one of our competitions) because it was really fast and didn't NaN in a real run

English

2.2K

tender@tenderizzation·12h

can someone tell me what fraction of the gpu mode leaderboard is just people doing this with their agents

GIF

sankalp@dejavucoder

trying to make it to top 10. i think 16342 microsecond would be a safe spot.

English

112

9.1K

sankalp retweetledi

Hamel Husain@HamelHusain·12h

New Blog Post: Do Automated Evals Work? There has been a rise of tools that look through your traces with AI and identifies issues. We tested these tools with real production data to see how good they are. Where they shine - They often spot issues human miss - Integrate into your workflow: viewing traces, creating LLM judges etc. Where they fall short - They miss problems that require domain expertise and taste - Don't have great mechanisms to learn from human feedback - You can get similar results from using your coding agent So you should use them? Yes, BUT do so iteratively with you in the loop. We describe how in the post: parlance-labs.com/blog/posts/aut… It's also a good idea to try using your coding agent with you in the loop, which we discuss in the post. This was written with @doesdatmaksense , who led the research and collated the results.

English

6.2K

sankalp@dejavucoder·11h

@tenderizzation pretty sure though if you have domain expertise, you can progress way faster. then there are people like gau nernst who probably hand write stuff

English

108

sankalp@dejavucoder·11h

@tenderizzation it will be like 99% (you can see and determine from the submission code later) but a lot of fucking around and find out (or domain expertise) is required to steer beyond even the 40k microsecond point in this one. inference time compute just works so well on multiple edge cases

English

304

sankalp retweetledi

Zach Mueller@TheZachMueller·1d

For the first time since Claude Code came out, I moved one of my actual work pipelines to @pidotdev & open-weight models. And after a weekend of fighting with it, I prefer the report it made. 36 pages vs 21 from Claude, more information-dense, prose I liked more, and pennies compared to the Claude API. It does not save on speed. 30-40 minutes instead of ~20. But it's running locally-adjacent/off resources I have so that's just fine. Setup was one 8xB200 node split 4/4 between: - GLM 5.2 NVFP4 (main agent/driver) - Kimi K2.7 Code NVFP4 (retriever). The dumbest fix throughout it: I summarized sources into briefs to save on context, then notes, losing information each time. Ended up saving all articles directly to disk so there were multiple layers of information retrieval I could work with.

English

161

13K

sankalp@dejavucoder·12h

@sama one more reset please your usage will go 4x

English

312

Sam Altman@sama·12h

2.5x increase in usage of our agentic products (codex and chatgpt work) in the last week! welcome.

English

797

365

12.3K

555.3K

sankalp@dejavucoder·12h

@majamediaco lets go

English

290

maja 🔭🍒@majamediaco·1d

this essay is also the deepest i’ve ever gone into the behind-the-scenes process of how i actually write, including my one-sided text conversations with myself, the random thoughts and fragments i collect throughout the day, and the notion pages where they become essays