Genghan Zhang (@zhang677) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915

English

4

16

7K

Genghan Zhang retweetledi

Hanchen Li @ ICLR@lihanc02·10 Mar

x.com/i/article/2031…

ZXX

2

20

52

12.6K

Genghan Zhang@zhang677·28 Oca

@lihanc02 Thank you, Hanchen!

English

0

29

Hanchen Li @ ICLR@lihanc02·28 Oca

@zhang677 Amazing results!

English

1

0

1

50

Genghan Zhang@zhang677·28 Oca

Thrilled to share three big updates on AccelOpt 🚀 1) New best results on AWS Trainium (NKI, Trn1) 2) Extended to NVIDIA H100 (Triton) 3) Our AccelOpt paper was accepted to MLSys 2026 🎉 See you in Seattle! All the related links are in the comments below!

English

5

2

10

394

Genghan Zhang@zhang677·28 Oca

Posts: X: x.com/zhang677/statu… LinkedIn: linkedin.com/feed/update/ur… linkedin.com/feed/update/ur…

Genghan Zhang@zhang677

🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915

English

0

56

Genghan Zhang@zhang677·28 Oca

Code: github.com/zhang677/Accel… Paper: arxiv.org/pdf/2511.15915 Blogs: - Part I: zhang677.github.io/blog_md/accelo… - Part II: zhang677.github.io/blog_md/accelo…

English

0

68

Genghan Zhang@zhang677·28 Oca

AccelOpt achieves 1.5× speedup over FlashInfer on a GQA paged decode kernel, and 1.38× on a GQA paged prefill kernel. Kernels here: github.com/zhang677/Accel…

English

0

1

73

Genghan Zhang@zhang677·28 Oca

AccelOpt beats the best public human-expert implementations—1.04× faster for Mamba and 1.4× faster for RoPE. Kernels here: github.com/zhang677/Accel…

English

0

71

Genghan Zhang retweetledi

Allen Nie (🇺🇦☮️)@allenainie·26 Kas

We constructed NKIBench with manually curated kernels commonly used in modern networks (RoPE, Mamba block, Group Query Attention, etc.). The code and benchmark release will come soon. Follow @zhang677, the project lead, for updates! Tweet: x.com/zhang677/statu… 📈🧵6/6

Genghan Zhang@zhang677

🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915

English

1

2

154

Genghan Zhang@zhang677·21 Kas

This is the follow-up to our ICML 2025 paper: Adaptive Self-improvement LLM Agentic System for ML Library Development (arxiv.org/pdf/2502.02534). Major upgrades: ⚡️Performance, not just correctness 💰Cost efficiency 🧠Reusable memory of optimization insights

English

0

1

147

Genghan Zhang@zhang677·21 Kas

🚀 AccelOpt, a self-improving LLM agentic system for AI accelerator kernel optimization. 📈 Boosts utilization from 49→61% on Trainium1 and 45→59% on Trainium2 using open-source models, matching Claude Sonnet 4 while being 26× cheaper Paper: arxiv.org/pdf/2511.15915

English

4

16

7K

Genghan Zhang@zhang677·21 Kas

Amazing collaboration with @ShaoweiZhu95pu @anjiangw @sunny_szy @allenainie @ Zhen Jia @nanditav17 @ Yida Wang @KunleOlukotun and Amazon Neuron Science Team!

English

0

167

Genghan Zhang@zhang677·21 Kas

🤖 Educational Impact: Stanford CS 149 (Parallel Computing) has adopted one resulting kernel and optimization insights generated by AccelOpt to improve the course materials

English

0

1

135

Genghan Zhang@zhang677·21 Kas

⚡️ AccelOpt LLM agents learn from their own exploration: no hand-crafted heuristics, no hardware-specific recipes. 💡 Search + Memory: AccelOpt combines beam search + optimization memory, uncovering both peephole rewrites and multi-step global transformations.

English

0

187

Genghan Zhang@zhang677·12 Kas

@JonSaadFalcon Really cool work!

English

0

1

175

Jon Saad-Falcon@JonSaadFalcon·12 Kas

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)

English

55

143

455

226.6K

Genghan Zhang@zhang677·16 Eyl

@anneouyang @Standard_Kernel @generalcatalyst @felicis Congrats!!

English

0

2

398

Anne Ouyang@anneouyang·15 Eyl

Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf of cuBLAs in 100 lines of code - Attention 104% perf of FlashAttention3 in 500 lines - Fused Llama3 FFN 120% perf of PyTorch (gpt-fast) Reach out if you want to work on AI kernel gen with us!

English

52

91

1K

207.8K

Genghan Zhang retweetledi

Anjiang Wei@anjiangw·25 Haz

We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning. 📄 arxiv.org/pdf/2503.23145 💻 github.com/Anjiang-Wei/Co… 🌐 anjiang-wei.github.io/CodeARC-Websit… #LLM #Reasoning #LLM4Code #ARC

English

3

31

95

12.9K

Genghan Zhang retweetledi

Simon Guo@simonguozirui·25 Şub

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

English

9

68

302

113.9K

Genghan Zhang@zhang677·20 Şub

@SakanaAILabs Excited to see new automation technologies for ML library development using Architecture Specific Programming Language (ASPL). Thanks for citing our work! We have also recently explored an adaptive self-improvement method for this task arxiv.org/abs/2502.02534

English

0

76

Sakana AI@SakanaAILabs·13 Ağu

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Sc… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.

English

288

1.5K

6.1K

3.4M

Genghan Zhang@zhang677·20 Şub

As stated in their paper: "A growing subset of work in this field has also begun to target kernel writing (KernelBench(arxiv.org/abs/2502.10517); Adaptive Self-improvement LLM Agentic System for ML Library Development(arxiv.org/abs/2502.02534)) in Architecture Specific Programming Languages such as CUDA, since efficiency improvements from these kernels can be incredibly valuable and skilled human kernel engineers are in high demand."

English

0

2

111

Genghan Zhang@zhang677·20 Şub

Excited to see new automation technologies for ML library development using Architecture Specific Programming Language (ASPL). We have recently explored an adaptive self-improvement method for this task arxiv.org/abs/2502.02534

Sakana AI@SakanaAILabs

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Sc… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.

English

1

2

203

Genghan Zhang

Keşfet