Henry Zhu

22 posts

Henry Zhu

@makneee

Katılım Ocak 2017

116 Takip Edilen300 Takipçiler

Henry Zhu@makneee·30 Mar

gpuasm lets you investigate your GPU kernels right from a browser. Upload a CUBIN, view the SASS, edit instructions, and inspect source. Share a link so that another person can view your kernels. Let the agent use the MCP to optimize SASS. gpuasm.com

English

Henry Zhu@makneee·30 Mar

nvcc/ptxas leaves performance on the table. We pointed an AI agent at the actual GPU assembly (SASS) and let it optimize a GEMM kernel autonomously. +5% FLOPs. No source code touched - assembly edits on a compiled binary. Full writeup: gpuasm.com/blog/autoresea…

English

157

Henry Zhu@makneee·30 Oca

Wrote up how NVIDIA's TileIR compiler works Compiled MoE kernel through every stage: CuTile -> nv_tileaa -> nv_tileas -> NVVM -> SASS and documented its passes maknee.github.io/blog/2026/NVID…

English

Henry Zhu@makneee·16 Ara

Can see how the optimization interleaves loads/ffma

English

Henry Zhu@makneee·16 Ara

Ran many benchmarks turning on/off this optimization (cutlass, flash attention 2/3, llamacpp, triton, pytorch, liger, vllm + more)

English

135

Henry Zhu@makneee·16 Ara

Wrote a small blog post investigating how putting "cutlass" in your cuda/triton kernel names can increase/decrease performance: maknee.github.io/blog/2025/Mayb…

English

177

Henry Zhu@makneee·16 Ara

TLDR; try optimization out, maybe it helps, maybe it doesn't. Always run benchmarks. Treat this optimization as a blackbox.

English

Henry Zhu retweetledi

SkyPilot@skypilot_org·14 Eki

How to train an AI agent -- not just prompt one? 🤖 We just dropped a deep dive on building agents: 🌋Train with RL using @verl_project 🖥️Monitor runs with @wandb 🚀Run & scale training on any AI compute (k8s or clouds) with SkyPilot. blog.skypilot.co/verl-rl-traini…

English

3.7K

Henry Zhu@makneee·16 Eyl

Lastly compares these local benchmarks against 3FS, asking questions like how much of an overhead is there and how much can it scale?

English

Henry Zhu@makneee·16 Eyl

Then goes into benchmarking local SSD/NVME (throughput + latency) for rand/seq read +write

English

Henry Zhu@makneee·16 Eyl

Continuing my DeepSeek distributed filesystem series... This time I evaluate the system with microbenchmarks maknee.github.io/blog/2025/3FS-…

English

149

Henry Zhu@makneee·11 Eyl

Wrote a post that looks into LLM training performance with different storage and network configurations maknee.github.io/blog/2025/Netw…

English

494

Henry Zhu@makneee·18 Haz

Wrote another post looking deeper into DeepSeek's distributed file system and its benchmarks (with some background on how to analyze these types of systems) maknee.github.io/blog/2025/3FS-…

English

Henry Zhu@makneee·16 Nis

I wrote a post looking into DeepSeek's distributed filesystem (with some background on these types of systems) maknee.github.io/blog/2025/3FS-…

English

106

719

41.7K

Henry Zhu@makneee·21 Şub

Tell me NVCC ain't just a bash script

English

399

Henry Zhu retweetledi

Liam Dugan@LiamDugan_·12 Eyl

✨New Paper✨: We release Kani 🦀 a highly-hackable open-source library for building LM apps with tool usage (e.g. plugins) Kani lets you easily write LM-callable functions in pure Python w/ robust type checks + model retry arxiv.org/abs/2309.05542 github.com/zhudotexe/kani

English

132

17.5K

Henry Zhu@makneee·22 Tem

Happy to release a little side project of mine: github.com/Maknee/minigpt… special thanks to @ggerganov's great work on ggml

English

10.7K

Henry Zhu@makneee·14 Eki

I'm excited to release Raysterizer, a framework that turns #RTXon for any game. Here's the trailer: #raytracing #RTX youtu.be/iuHRDvmhX9Y

YouTube

GIF

English

Henry Zhu retweetledi

Zhengyi “Zen” Luo@zhengyiluo·12 Kas

Our work, 3D Human Motion Estimation via Motion Compression and Refinement, has been accepted to ACCV 2020 (Oral)! We focus on extracting stable and natural-looking human motion: Check out our demo: youtu.be/YBb9NDz3ngM (1/2)

YouTube

English

Keşfet

@verl_project @wandb @ggerganov @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates