Genghan Zhang

39 posts

Genghan Zhang

Genghan Zhang

@zhang677

Katılım Eylül 2023
123 Takip Edilen122 Takipçiler
Sabitlenmiş Tweet
Genghan Zhang
Genghan Zhang@zhang677·
🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915
Genghan Zhang tweet media
English
4
4
16
7K
Genghan Zhang
Genghan Zhang@zhang677·
Thrilled to share three big updates on AccelOpt 🚀 1) New best results on AWS Trainium (NKI, Trn1) 2) Extended to NVIDIA H100 (Triton) 3) Our AccelOpt paper was accepted to MLSys 2026 🎉 See you in Seattle! All the related links are in the comments below!
English
5
2
10
394
Genghan Zhang
Genghan Zhang@zhang677·
AccelOpt achieves 1.5× speedup over FlashInfer on a GQA paged decode kernel, and 1.38× on a GQA paged prefill kernel. Kernels here: github.com/zhang677/Accel…
English
0
0
1
73
Genghan Zhang retweetledi
Allen Nie (🇺🇦☮️)
Allen Nie (🇺🇦☮️)@allenainie·
We constructed NKIBench with manually curated kernels commonly used in modern networks (RoPE, Mamba block, Group Query Attention, etc.). The code and benchmark release will come soon. Follow @zhang677, the project lead, for updates! Tweet: x.com/zhang677/statu… 📈🧵6/6
Allen Nie (🇺🇦☮️) tweet media
Genghan Zhang@zhang677

🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915

English
1
1
2
154
Genghan Zhang
Genghan Zhang@zhang677·
This is the follow-up to our ICML 2025 paper: Adaptive Self-improvement LLM Agentic System for ML Library Development (arxiv.org/pdf/2502.02534). Major upgrades: ⚡️Performance, not just correctness 💰Cost efficiency 🧠Reusable memory of optimization insights
English
0
0
1
147
Genghan Zhang
Genghan Zhang@zhang677·
🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915
Genghan Zhang tweet media
English
4
4
16
7K
Genghan Zhang
Genghan Zhang@zhang677·
🤖 Educational Impact: Stanford CS 149 (Parallel Computing) has adopted one resulting kernel and optimization insights generated by AccelOpt to improve the course materials
English
0
0
1
135
Genghan Zhang
Genghan Zhang@zhang677·
⚡️ AccelOpt LLM agents learn from their own exploration: no hand-crafted heuristics, no hardware-specific recipes. 💡 Search + Memory: AccelOpt combines beam search + optimization memory, uncovering both peephole rewrites and multi-step global transformations.
English
0
0
0
187
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)
Jon Saad-Falcon tweet media
English
55
143
455
226.6K
Anne Ouyang
Anne Ouyang@anneouyang·
Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf of cuBLAs in 100 lines of code - Attention 104% perf of FlashAttention3 in 500 lines - Fused Llama3 FFN 120% perf of PyTorch (gpt-fast) Reach out if you want to work on AI kernel gen with us!
Anne Ouyang tweet media
English
52
91
1K
207.8K
Genghan Zhang retweetledi
Simon Guo
Simon Guo@simonguozirui·
LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇
Simon Guo tweet media
English
9
68
302
113.9K
Genghan Zhang
Genghan Zhang@zhang677·
@SakanaAILabs Excited to see new automation technologies for ML library development using Architecture Specific Programming Language (ASPL). Thanks for citing our work! We have also recently explored an adaptive self-improvement method for this task arxiv.org/abs/2502.02534
English
0
0
0
76
Sakana AI
Sakana AI@SakanaAILabs·
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Sc… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.
Sakana AI tweet mediaSakana AI tweet mediaSakana AI tweet mediaSakana AI tweet media
English
288
1.5K
6.1K
3.4M
Genghan Zhang
Genghan Zhang@zhang677·
As stated in their paper: "A growing subset of work in this field has also begun to target kernel writing (KernelBench(arxiv.org/abs/2502.10517); Adaptive Self-improvement LLM Agentic System for ML Library Development(arxiv.org/abs/2502.02534)) in Architecture Specific Programming Languages such as CUDA, since efficiency improvements from these kernels can be incredibly valuable and skilled human kernel engineers are in high demand."
English
0
0
2
111
Genghan Zhang
Genghan Zhang@zhang677·
Excited to see new automation technologies for ML library development using Architecture Specific Programming Language (ASPL). We have recently explored an adaptive self-improvement method for this task arxiv.org/abs/2502.02534
Sakana AI@SakanaAILabs

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Sc… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.

English
1
1
2
203