Henry Zhu

22 posts

Henry Zhu

Henry Zhu

@makneee

Katılım Ocak 2017
116 Takip Edilen300 Takipçiler
Henry Zhu
Henry Zhu@makneee·
gpuasm lets you investigate your GPU kernels right from a browser. Upload a CUBIN, view the SASS, edit instructions, and inspect source. Share a link so that another person can view your kernels. Let the agent use the MCP to optimize SASS. gpuasm.com
Henry Zhu tweet media
English
0
1
1
78
Henry Zhu
Henry Zhu@makneee·
nvcc/ptxas leaves performance on the table. We pointed an AI agent at the actual GPU assembly (SASS) and let it optimize a GEMM kernel autonomously. +5% FLOPs. No source code touched - assembly edits on a compiled binary. Full writeup: gpuasm.com/blog/autoresea…
Henry Zhu tweet media
English
1
1
0
157
Henry Zhu
Henry Zhu@makneee·
Wrote up how NVIDIA's TileIR compiler works Compiled MoE kernel through every stage: CuTile -> nv_tileaa -> nv_tileas -> NVVM -> SASS and documented its passes maknee.github.io/blog/2026/NVID…
Henry Zhu tweet media
English
1
6
36
2K
Henry Zhu
Henry Zhu@makneee·
Can see how the optimization interleaves loads/ffma
Henry Zhu tweet media
English
0
0
0
76
Henry Zhu
Henry Zhu@makneee·
Ran many benchmarks turning on/off this optimization (cutlass, flash attention 2/3, llamacpp, triton, pytorch, liger, vllm + more)
Henry Zhu tweet media
English
1
0
0
135
Henry Zhu
Henry Zhu@makneee·
TLDR; try optimization out, maybe it helps, maybe it doesn't. Always run benchmarks. Treat this optimization as a blackbox.
English
0
0
0
72
Henry Zhu retweetledi
SkyPilot
SkyPilot@skypilot_org·
How to train an AI agent -- not just prompt one? 🤖 We just dropped a deep dive on building agents: 🌋Train with RL using @verl_project 🖥️Monitor runs with @wandb 🚀Run & scale training on any AI compute (k8s or clouds) with SkyPilot. blog.skypilot.co/verl-rl-traini…
SkyPilot tweet media
English
0
3
13
3.7K
Henry Zhu
Henry Zhu@makneee·
Lastly compares these local benchmarks against 3FS, asking questions like how much of an overhead is there and how much can it scale?
Henry Zhu tweet mediaHenry Zhu tweet media
English
0
0
0
80
Henry Zhu
Henry Zhu@makneee·
Then goes into benchmarking local SSD/NVME (throughput + latency) for rand/seq read +write
Henry Zhu tweet mediaHenry Zhu tweet media
English
1
0
1
93
Henry Zhu
Henry Zhu@makneee·
Wrote another post looking deeper into DeepSeek's distributed file system and its benchmarks (with some background on how to analyze these types of systems) maknee.github.io/blog/2025/3FS-…
Henry Zhu tweet media
English
0
13
91
4K
Henry Zhu
Henry Zhu@makneee·
Tell me NVCC ain't just a bash script
Henry Zhu tweet media
English
0
0
2
399
Henry Zhu retweetledi
Liam Dugan
Liam Dugan@LiamDugan_·
✨New Paper✨: We release Kani 🦀 a highly-hackable open-source library for building LM apps with tool usage (e.g. plugins) Kani lets you easily write LM-callable functions in pure Python w/ robust type checks + model retry arxiv.org/abs/2309.05542 github.com/zhudotexe/kani
Liam Dugan tweet media
English
4
31
132
17.5K
Henry Zhu retweetledi
Zhengyi “Zen” Luo
Zhengyi “Zen” Luo@zhengyiluo·
Our work, 3D Human Motion Estimation via Motion Compression and Refinement, has been accepted to ACCV 2020 (Oral)! We focus on extracting stable and natural-looking human motion: Check out our demo: youtu.be/YBb9NDz3ngM (1/2)
YouTube video
YouTube
English
2
4
15
0