DK (@donghaxkim) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

DK@donghaxkim·29 Mar

I built figma for your localhost

English

36

34

765

70.2K

DK retweetledi

Elon Musk@elonmusk·3h

The latest 𝕏 algorithm has been published to GitHub github.com/xai-org/x-algo…

English

4.6K

4.4K

33.9K

14.8M

DK@donghaxkim·11h

There’s so many damn blogs k wanna read but I don’t have the time to read all of them

English

0

3

25

DK retweetledi

Lucas Jin@lucashjin·6 May

existing video to ascii components suck. so i built one that doesn't.

English

79

151

2.6K

155.4K

DK retweetledi

Catherine Yeo@catherinehyeo·5 May

Introducing Altara: the scientific intelligence platform for the physical world. Today @evatuecke and I are excited to announce our $7M seed led by @GreylockVC, joined by @Neo, @BoxGroup, @Liquid2V, and angel investors including @JeffDean and leadership from OpenAI & AMD. We’re already working with early customers in semiconductors, batteries, and advanced materials. More below.

English

133

128

826

155K

DK retweetledi

Aadi Kulshrestha@MankyDankyBanky·2 May

I compressed the KV cache of my custom LLM by 6x using multi-head latent attention (MLA). Feats: - 6x smaller cache than fp32 MHA (768 KB vs 4.5 MB at 256 tokens) - One 128-dimensional latent per token instead of full K, V vectors - Weight absorption at load so attention runs entirely in latent space, eliminating up-projection during inference Compare MHA vs MLA in your browser via our demo: mni-ml.github.io/demos/mla/

English

6

68

3.3K

DK@donghaxkim·1 May

@ArmaniArghavani Nah ur him arman

English

1

0

1

291

Arman@ArmaniArghavani·1 May

UWaterloo 2A co-op stats. The robotics job market so good rn🙏

English

10

0

76

5.5K

DK@donghaxkim·27 Nis

@ArmaniArghavani Waterloo mogs

Nederlands

1

0

4

2.4K

Arman@ArmaniArghavani·27 Nis

I guess we own MIT now

English

10

1

163

29.4K

DK@donghaxkim·25 Nis

claude is genuinely so ass

English

0

5

297

DK retweetledi

nic@nicholaschen__·24 Nis

x.com/i/article/2047…

ZXX

16

9

77

15.4K

DK retweetledi

Jordan Khatri@jordankhatri23·24 Nis

ros is really cool

English

0

4

8

768

DK retweetledi

Cursor@cursor_ai·22 Nis

We're partnering with SpaceX to improve Composer. cursor.com/blog/spacex-mo…

English

419

1.2K

12.7K

771.4K

DK@donghaxkim·21 Nis

@austinjian_ @phoebe_work_ Nice Austin

English

1

0

1

157

austin jian@austinjian_·21 Nis

I have some life updates to share! I'm finally done both recruiting and my first year at waterloo I'll be moving to New York City this summer to work at @phoebe_work_ as a software engineering intern! This past semester has been pretty grindy with balancing school, recruiting, and a part-time but it's been very rewarding First co-op hunt has also been pretty difficult. Went 0/55 on waterlooworks and failed a lot of interviews (especially in the beginning of the cycle). However, a lot of people have helped me out by warm introing, mock interviews, and much more, making this process a whole lot easier for me. I have a lot of ppl to thank. For Phoebe specifically, I'm very grateful to @adiprasadd @shayaan_azeem for the intro, @jrwoodbridge for the quick and smooth hiring process, and fellow intern @casperdongg for helping me navigate with visas and more 🙏 It's been a good year and I'm excited for my next chapter. hmu if you're in nyc this summer!

English

59

1

191

10.3K

DK@donghaxkim·21 Nis

@_reesechong nice reese

English

0

2

124

DK retweetledi

Reese Chong@_reesechong·21 Nis

I added KV caching and INT8 KV quantization to our transformer inference, improving throughput by 35x. All of this was done from scratch in Rust + CUDA, on top of a homemade ML framework. On a 4-token prompt with 252 generated tokens: - Original: 0.76 tok/s - KV cache fp32: 27.21 tok/s - KV cache int8 (quantized): 27.29 tok/s Try it out yourself here: mni-ml.github.io/demos/kv-cache/ In practice: - KV caching gave us about a 35x end-to-end speedup - INT8 KV cache kept roughly the same speed as fp32 but cut KV cache memory by 3.78x FP32 cache used 4.5 MB in this run while the INT8 cache used only 1.19 MB This simple change to inference created a huge impact on performance. To learn more about the KV cache and other optimizations like this, check out the blog at mni.ml!

English

21

23

499

50.9K

DK@donghaxkim·20 Nis

@jordankhatri23 Good solution

English

0

1

40

Jordan Khatri@jordankhatri23·20 Nis

- finished 2A - stopped GPAmaxxing and started lifemaxxing time to go to work

GIF

English

3

0

12

294

DK@donghaxkim·20 Nis

@lucashjin i guess bro

English

0

3

87

Lucas Jin@lucashjin·20 Nis

yk ur tapped in when ur playing league, hear darius say “make no mistakes” and think of claude

English

5

0

35

1.5K

DK@donghaxkim·20 Nis

loglogloglogpn

Indonesia

2

0

6

221

DK retweetledi

Aadi Kulshrestha@MankyDankyBanky·20 Nis

I integrated speculative decoding into the LLM I trained from scratch leading to 3x improvements in token throughput (demo + blog below). This is just one of many inference optimizations we discuss in our latest blog post. Check it out if you want to learn about: - The KV Cache - Mixture of Experts - Paged Attention - Quantization - and much more Using an ML framework @_reesechong and I wrote in Rust + CUDA, we trained a smaller 2M parameter LLM that proposes draft tokens. These tokens can then be accepted together with just one forward pass of the larger model, decreasing inter-token latency. You can now visualize speculative decoding in the link below, all running in your browser. Try it out and build your own ML projects with: npm i @mni-ml/framework Already sitting at 1600+ downloads!

English

10

13

145

8.7K

DK retweetledi

Reese Chong@_reesechong·18 Nis

Behind the scenes of mni-ml: January 4th 2026 - my roommate @MankyDankyBanky and I wanted to do a big project together. ”maybe we should try to build pytorch from scratch” We found @srush_nlp's minitorch curriculum and committed to grinding through it Jan to April. February - autodiff and tensor internals done. lots of late night PR reviews, stacked diffs, Kinton ramen runs to Toronto when I'd visit Aadi at Shopify. We started posting on X to keep ourselves accountable. March - the month of parallelization: Aadi shipped tiled matmul using the same algo @nvidia teaches in their CUDA guide, wrapped by end of month - pooling, conv1d/2d forward+backward, softmax, dropout. March 22-23 — @socraticainfo symposium & we see the tinytpu team on the stage which filled us with determination 🫡 cc: @evanliin @XanderChin @suryasure05 @kennykgguo March 24 - chose the mni-ml brand and started the educational blog March 30 - minitorch is DONE ahead of schedule. now we build on top of the framework. April 5-6 - cuBLAS matmul via koffi FFI. buffer pooling, strided batched GEMM, kernel optimizations. CUDA backend takes shape. April 7 - huge day. cross-platform CI pipeline, prebuilt npm binaries, v0.3.0 — CUDA live on @npmjs. flatten the monorepo, add @WebGPU + Windows CUDA build targets by eod. April 12 - flash attention CUDA kernel ships. we caught a bug where head dim > 32 was truncating. April 14 (during exam season), we recorded the demo in @Shopify recording studio during Aadi’s lunch break. Everything over the last 4mo finally came together. Cc: @fnthawar @tobi @alspee April 17: launch post and bought the domain mni.ml and we’re just getting started. We have so much in store for this summer, stay tuned 🫡 cc: @sundeep @GavinSherry

Aadi Kulshrestha@MankyDankyBanky

I trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more. Wrote the full transformer architecture, and BPE tokenizer from scratch. The framework features: - Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x increased throughput - Automatic WebGPU fallback for non-NVIDIA devices - TypeScript API with Rust compute backend - One npm install to get started, prebuilt binaries for every platform Try out the model for yourself: mni-ml.github.io/demos/transfor… Built with @_reesechong. Check out the repos and blog if you want to learn more. Shoutout to @modal for the compute credits allowing me to train on 2 A100 GPUs without going broke cc @sundeep @GavinSherry

English

15

12

240

41.6K

DK retweetledi

Aadi Kulshrestha@MankyDankyBanky·17 Nis

I trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more. Wrote the full transformer architecture, and BPE tokenizer from scratch. The framework features: - Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x increased throughput - Automatic WebGPU fallback for non-NVIDIA devices - TypeScript API with Rust compute backend - One npm install to get started, prebuilt binaries for every platform Try out the model for yourself: mni-ml.github.io/demos/transfor… Built with @_reesechong. Check out the repos and blog if you want to learn more. Shoutout to @modal for the compute credits allowing me to train on 2 A100 GPUs without going broke cc @sundeep @GavinSherry

English

131

259

3.5K

786.7K

DK

Keşfet