Kunal

653 posts

Kunal banner
Kunal

Kunal

@therealkmans

infra @Tesla_AI. building @LeetGPU. I've installed CUDA one thousand times so you never have to.

Katılım Mart 2023
316 Takip Edilen225 Takipçiler
Kunal retweetledi
LeetGPU
LeetGPU@LeetGPU·
We just released a new challenge today ❗ GPT-2 (120M) Transformer Block Compose multiple kernels into a full transformer block. The first of many upcoming challenges focused on real-world inference optimization. Write your solution in CUDA, Triton, PyTorch, JAX, Mojo, or CuTe DSL and benchmark it on state-of-the-art GPUs like the H100, H200, and B200, and more.
LeetGPU tweet media
English
8
53
586
29.9K
DM
DM@Sudden_Sudo·
@vikhyatk @sasuke___420 @JohnThilen @willccbb Leetgpu is disappointing. Tensara is actually quite good but could use far more problems. How they score Leaderboards is a bit iffy though.
English
2
0
2
68
will brown
will brown@willccbb·
fake software engineering jobs are dying rapidly. real software engineering jobs, however,
English
47
27
1.4K
146.9K
vik
vik@vikhyatk·
@sasuke___420 @JohnThilen @willccbb there's no good one for gpus, someone has to make one i did leetgpu for a bit. gpu mode also hosts contests. but really just go and make vllm a little faster
English
5
0
8
1.3K
Kunal
Kunal@therealkmans·
Coporate america hates people who eat lunch after 1:30
English
0
0
0
67
Kunal
Kunal@therealkmans·
We now have claude code on a daily cron making PRs for new challenges on @LeetGPU 👀
English
0
0
1
93
Kunal
Kunal@therealkmans·
Wise words from claude
Kunal tweet media
English
0
0
1
114
Kunal
Kunal@therealkmans·
@valigo I've never had this be a problem
English
0
0
0
139
Valentin Ignatev
Valentin Ignatev@valigo·
Sorry, I just can't take any programming language seriously that teats unused variable, or unused import, as a hard error. Such a nightmare to prototype in. Feel free to use them if you like fake feeling of being productive when you comply with the linter.
English
73
11
676
62.3K
Kunal
Kunal@therealkmans·
@goyalayus Deployed a fix can you try again?
English
3
0
0
18
Kunal retweetledi
LeetGPU
LeetGPU@LeetGPU·
We just shipped JAX support on all challenges 🚀 Try it out!
English
0
1
8
539
Kunal
Kunal@therealkmans·
Subway can be so good but its way too overpriced
English
0
0
2
100
Kunal
Kunal@therealkmans·
@0xlelouch_ Huge fan of lexical confinement
English
0
0
2
123
Abhishek Singh
Abhishek Singh@0xlelouch_·
I wired up a tiny actor runtime in Go, pumped 10M messages through it, and got ~30M ops/s. It’s not production-stable yet, but the pattern feels right. In Go, the actor model is simply: “one goroutine owns some state, and you only touch that state by sending it messages over a channel.” No locks, no shared mutable state, no random goroutines poking into your structs. --- Core idea - Each actor = 1 goroutine + 1 mailbox (chan Msg) - Actor has private state inside the goroutine - Other parts of the program never access that state directly – they just Send(msg) - The actor loops forever: for msg := range inbox { handle(msg) }
Abhishek Singh tweet media
Sunny Bains @TiDB@sunbains

I'm sold on the actor model. After some struggling, finally cracked it. This is the max, It's not stable but stable enough and it gives a good feeling :-). [test_throughput_actor_model] total_messages=10000000, iterations=1, elapsed=0.334s, throughput=913.88 MiB/s, ops=29.95 M ops/s (min=29.95, max=29.95)

English
22
26
434
42.8K
Gary Clarke
Gary Clarke@garyclarketech·
When querying databases in Go, use & with rows.Scan() to pass addresses, not values. Scan() needs to write directly into your struct fields. From my Build Your First Go App course...follow for updates
Gary Clarke tweet media
English
6
6
112
24.4K
Kunal
Kunal@therealkmans·
@0xlelouch_ Use a WaitGroup instead of a done channel
English
2
0
3
170
Abhishek Singh
Abhishek Singh@0xlelouch_·
Nice. Let me share a small go function to understand language semantics. Backpressure and coordination using channels. Channels are not just for passing data The jobs channel also slows down the producer because it has a buffer of only 2. If workers fall behind, the producer blocks. That’s backpressure for free! Goroutines scale downward just as cleanly as upward You launch a few workers and the language handles all the scheduling for you. No mutexes, no condition variables. Completion coordination is trivial The done channel lets main wait for all work to complete without any complex thread management.
Abhishek Singh tweet media
English
7
17
173
18.7K
Kunal
Kunal@therealkmans·
How you code is how you think
English
1
0
2
170
Kunal
Kunal@therealkmans·
@TWangpo_ What if we had an MNIST challenge 👀
English
0
0
3
29
dodhon
dodhon@TWangpo_·
where should I go after leetgpu? I just did my first 3 problems and I'm thinking another implementation of mnist might be a good next step
dodhon tweet media
English
1
0
0
78