Apache TVM

685 posts

Apache TVM banner
Apache TVM

Apache TVM

@ApacheTVM

Open deep learning compiler stack for CPUs, GPUs and specialized accelerators. Join us for the TVM and Deep Learning Compilation Conference https://t.co/i6MTbWYt87

Katılım Ocak 2018
931 Takip Edilen3.8K Takipçiler
Sabitlenmiş Tweet
Apache TVM
Apache TVM@ApacheTVM·
ICYMI, all of the sessions from #tvmcon are available for streaming! Catch up on the latest advances, case studies, and tutorials in #ML acceleration from the @ApacheTVM community. tvmcon.org
English
1
1
11
0
Apache TVM retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
I’ll be giving a talk on TVM-FFI at @GPU_MODE this week! We will discuss how open ABI and FFI facilitate a fast, robust, and seamless framework interop experience across DSLs and kernel libraries.
GPU MODE@GPU_MODE

This Saturday Jan 31 at Noon PST we have one of the founders of the whole field of ML Systems @tqchenml who will be giving a talk on tvm-ffi - an open ABI and FFI for ML systems which has grown tremendously in relevance with the explosion of Kernel DSLs youtube.com/watch?v=xMzcs6…

English
1
15
130
17.4K
Apache TVM retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
📢#MLSys2026 this year features contest tracks, checkout the anouncement on optimizing FlashInfer-Bench LLM inference kernels for NVIDIA blackwell GPUs 👉
Zihao Ye@ye_combinator

🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥

English
0
11
62
9.3K
Apache TVM retweetledi
Zihao Ye
Zihao Ye@ye_combinator·
🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥
English
4
57
295
72.2K
Apache TVM retweetledi
Bing Xu
Bing Xu@bingxu_·
Just open-sourced VibeTensor — the first deep learning system fully generated by an AI agent, with 0 lines of human-written code: github.com/NVlabs/vibeten… It’s a working DL system with RCU style dispatcher, a cache allocator and reverse-mode autograd. The agent also invented a Fabric Tensor system — something that doesn’t exist in any current framework. The Vibe Kernel includes 13 kinds and 47k LOC of generated Triton and CuteDSL kernels with strong performance. VibeTensor was generated by our 4th-generation agent. It shows a “Frankenstein Effect”: the system is correct, but some critical paths are designed in inefficient ways. As a result, performance isn’t comparable to PyTorch. I haven’t written a single line of code since summer 2025. I started this effort after @karpathy 's podcast — I didn’t agree with his arguments, so Terry Chen and I began using it as a stress test for our agents. The “Frankenstein Effect” ended up exposing some of our agent’s limitations — but the direction is clear.
Bing Xu tweet media
English
10
38
234
47.9K
Apache TVM retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
Checkout VibeTensor, it is interesting to see how agents start to get build something that otherwise as complex as a deep learning framework. The generated code can still use some further refinements , but the ability to do it is something quite interesting
Bing Xu@bingxu_

Just open-sourced VibeTensor — the first deep learning system fully generated by an AI agent, with 0 lines of human-written code: github.com/NVlabs/vibeten… It’s a working DL system with RCU style dispatcher, a cache allocator and reverse-mode autograd. The agent also invented a Fabric Tensor system — something that doesn’t exist in any current framework. The Vibe Kernel includes 13 kinds and 47k LOC of generated Triton and CuteDSL kernels with strong performance. VibeTensor was generated by our 4th-generation agent. It shows a “Frankenstein Effect”: the system is correct, but some critical paths are designed in inefficient ways. As a result, performance isn’t comparable to PyTorch. I haven’t written a single line of code since summer 2025. I started this effort after @karpathy 's podcast — I didn’t agree with his arguments, so Terry Chen and I began using it as a stress test for our agents. The “Frankenstein Effect” ended up exposing some of our agent’s limitations — but the direction is clear.

English
2
16
104
15.7K
Apache TVM retweetledi
Abdussamet
Abdussamet@aturker01·
CuTE DSL host overhead kills performance especially for small kernels. I tried tvm-ffi and it looks super useful. it shaves ~28 µs, which makes the kernel ~2× faster at 1024×1024
Abdussamet tweet media
English
2
6
34
3K
Apache TVM retweetledi
Ying Sheng
Ying Sheng@ying11231·
We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August — a place where I built deep emotions and countless beautiful memories. It was the best place I’ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, we’ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again — often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, we’re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). radixark.ai
English
112
127
1.1K
538.8K
Apache TVM retweetledi
Lianmin Zheng
Lianmin Zheng@lm_zheng·
I’ve heard many positive experiences with tvm-ffi. One contributor recently integrated it into SGLang to enable JIT compilation, and it is much faster than the default PyTorch interface due to its lightweight design. We plan to gradually move more kernels to the JIT style to reduce binary size.
Lianmin Zheng tweet media
Tianqi Chen@tqchenml

CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI

English
2
20
192
26K
Apache TVM retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI
Tianqi Chen tweet media
English
9
61
333
53K
Apache TVM retweetledi
Yixin Dong
Yixin Dong@yi_xin_dong·
Amazing results! 🚀 Glad to see TVM-FFI helping TileLang deliver 2–3× faster compilation. TVM-FFI's universal design also opens the door for bringing TileLang's fast kernels to many more ML frameworks.
Lei Wang@Lei_Wang_1999

🚀 tilelang now fully embraces tvm-ffi! 💡 Not only is the compiler deeply powered by tvm_ffi, we've also replaced old pybind parts with tvm_ffi too. ⚙️ With host codegen moving attribute checks from Python → C++, CPU overhead dropped 2.1×–3.8×, compile speed boosted 2.1×–3.3×!

English
0
1
6
1.2K
Apache TVM retweetledi
Lei Wang
Lei Wang@Lei_Wang_1999·
🚀 tilelang now fully embraces tvm-ffi! 💡 Not only is the compiler deeply powered by tvm_ffi, we've also replaced old pybind parts with tvm_ffi too. ⚙️ With host codegen moving attribute checks from Python → C++, CPU overhead dropped 2.1×–3.8×, compile speed boosted 2.1×–3.3×!
Lei Wang tweet media
English
2
17
74
25.7K
Apache TVM retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
GPU kernels DSLs are fun, but it is hard to make them low host overhead, robust (check constraints and give you proper Python errors), and interopable (pass in torch Tensors and ship to C++). #TVMFFI serves as an open ABI convention for TileLang on these fronts 🚀
Lei Wang@Lei_Wang_1999

🚀 tilelang now fully embraces tvm-ffi! 💡 Not only is the compiler deeply powered by tvm_ffi, we've also replaced old pybind parts with tvm_ffi too. ⚙️ With host codegen moving attribute checks from Python → C++, CPU overhead dropped 2.1×–3.8×, compile speed boosted 2.1×–3.3×!

English
3
17
122
12.3K
Apache TVM retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
#MLSys2026 is inviting self-nominations for the External Review Committee (ERC)! If you want to contribute to the review process for the MLSys conference, nominate yourself and help shape this year's program. We especially welcome PhD students and early-career researchers! forms.gle/YdAih8VLuwSF1E…
English
2
12
19
9.6K
Apache TVM retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
🧵Reflecting a bit after @PyTorch conference. ML compilers becoming "toolkits" rather than monolithic piece. Their target are also sub-modules that must interoperates with other pieces. This is THE biggest mindset difference from traditional compilers.
English
3
11
87
7.1K
Apache TVM retweetledi
Apache TVM retweetledi
Apache TVM retweetledi
Ruihang Lai
Ruihang Lai@ruihanglai·
TVM FFI captures the core and foundational insights we’ve gained from years of ML systems research. Can't wait to see such an open ABI enable new possibilities across systems and platforms 🎉
Tianqi Chen@tqchenml

📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…

English
0
2
14
1.6K
Apache TVM retweetledi