Apache TVM

11

62

9.3K

Apache TVM retweetledi

Zihao Ye@ye_combinator·24 Oca

🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥

English

4

57

295

72.2K

Apache TVM retweetledi

Bing Xu@bingxu_·22 Oca

Just open-sourced VibeTensor — the first deep learning system fully generated by an AI agent, with 0 lines of human-written code: github.com/NVlabs/vibeten… It’s a working DL system with RCU style dispatcher, a cache allocator and reverse-mode autograd. The agent also invented a Fabric Tensor system — something that doesn’t exist in any current framework. The Vibe Kernel includes 13 kinds and 47k LOC of generated Triton and CuteDSL kernels with strong performance. VibeTensor was generated by our 4th-generation agent. It shows a “Frankenstein Effect”: the system is correct, but some critical paths are designed in inefficient ways. As a result, performance isn’t comparable to PyTorch. I haven’t written a single line of code since summer 2025. I started this effort after @karpathy 's podcast — I didn’t agree with his arguments, so Terry Chen and I began using it as a stress test for our agents. The “Frankenstein Effect” ended up exposing some of our agent’s limitations — but the direction is clear.

English

10

38

234

47.9K

Apache TVM retweetledi

Tianqi Chen@tqchenml·22 Oca

Checkout VibeTensor, it is interesting to see how agents start to get build something that otherwise as complex as a deep learning framework. The generated code can still use some further refinements , but the ability to do it is something quite interesting

Bing Xu@bingxu_

Just open-sourced VibeTensor — the first deep learning system fully generated by an AI agent, with 0 lines of human-written code: github.com/NVlabs/vibeten… It’s a working DL system with RCU style dispatcher, a cache allocator and reverse-mode autograd. The agent also invented a Fabric Tensor system — something that doesn’t exist in any current framework. The Vibe Kernel includes 13 kinds and 47k LOC of generated Triton and CuteDSL kernels with strong performance. VibeTensor was generated by our 4th-generation agent. It shows a “Frankenstein Effect”: the system is correct, but some critical paths are designed in inefficient ways. As a result, performance isn’t comparable to PyTorch. I haven’t written a single line of code since summer 2025. I started this effort after @karpathy 's podcast — I didn’t agree with his arguments, so Terry Chen and I began using it as a stress test for our agents. The “Frankenstein Effect” ended up exposing some of our agent’s limitations — but the direction is clear.

English

16

104

15.7K

Apache TVM retweetledi

Abdussamet@aturker01·21 Ara

CuTE DSL host overhead kills performance especially for small kernels. I tried tvm-ffi and it looks super useful. it shaves ~28 µs, which makes the kernel ~2× faster at 1024×1024

English

6

34

3K

Apache TVM retweetledi

Ying Sheng@ying11231·8 Ara

We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August — a place where I built deep emotions and countless beautiful memories. It was the best place I’ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, we’ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again — often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, we’re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). radixark.ai

English

112

127

1.1K

538.8K

Apache TVM retweetledi

Lianmin Zheng@lm_zheng·29 Kas

I’ve heard many positive experiences with tvm-ffi. One contributor recently integrated it into SGLang to enable JIT compilation, and it is much faster than the default PyTorch interface due to its lightweight design. We plan to gradually move more kernels to the JIT style to reduce binary size.

CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI

English

20

192

26K

Apache TVM retweetledi

Tianqi Chen@tqchenml·28 Kas

Checkout docs.nvidia.com/cutlass/latest… to learn more

English

1

7

1.9K

Apache TVM retweetledi

Tianqi Chen@tqchenml·28 Kas

CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI

English

9

61

333

53K

Apache TVM retweetledi

Yixin Dong@yi_xin_dong·19 Kas

Amazing results! 🚀 Glad to see TVM-FFI helping TileLang deliver 2–3× faster compilation. TVM-FFI's universal design also opens the door for bringing TileLang's fast kernels to many more ML frameworks.

Lei Wang@Lei_Wang_1999

🚀 tilelang now fully embraces tvm-ffi! 💡 Not only is the compiler deeply powered by tvm_ffi, we've also replaced old pybind parts with tvm_ffi too. ⚙️ With host codegen moving attribute checks from Python → C++, CPU overhead dropped 2.1×–3.8×, compile speed boosted 2.1×–3.3×!

English

1

6

1.2K

Apache TVM retweetledi

Lei Wang@Lei_Wang_1999·18 Kas

github.com/apache/tvm-ffi is absolutely super powerful

English

6

22

3.1K

Apache TVM retweetledi

Lei Wang@Lei_Wang_1999·18 Kas

🚀 tilelang now fully embraces tvm-ffi! 💡 Not only is the compiler deeply powered by tvm_ffi, we've also replaced old pybind parts with tvm_ffi too. ⚙️ With host codegen moving attribute checks from Python → C++, CPU overhead dropped 2.1×–3.8×, compile speed boosted 2.1×–3.3×!

English

17

74

25.7K

Apache TVM retweetledi

Tianqi Chen@tqchenml·18 Kas

GPU kernels DSLs are fun, but it is hard to make them low host overhead, robust (check constraints and give you proper Python errors), and interopable (pass in torch Tensors and ship to C++). #TVMFFI serves as an open ABI convention for TileLang on these fronts 🚀

Lei Wang@Lei_Wang_1999

🚀 tilelang now fully embraces tvm-ffi! 💡 Not only is the compiler deeply powered by tvm_ffi, we've also replaced old pybind parts with tvm_ffi too. ⚙️ With host codegen moving attribute checks from Python → C++, CPU overhead dropped 2.1×–3.8×, compile speed boosted 2.1×–3.3×!

English

3

17

122

12.3K

Apache TVM retweetledi

Zhihao Jia@JiaZhihao·6 Kas

#MLSys2026 is inviting self-nominations for the External Review Committee (ERC)! If you want to contribute to the review process for the MLSys conference, nominate yourself and help shape this year's program. We especially welcome PhD students and early-career researchers! forms.gle/YdAih8VLuwSF1E…

English

12

19

9.6K

Apache TVM retweetledi

Tianqi Chen@tqchenml·28 Eki

🧵Reflecting a bit after @PyTorch conference. ML compilers becoming "toolkits" rather than monolithic piece. Their target are also sub-modules that must interoperates with other pieces. This is THE biggest mindset difference from traditional compilers.

English

3

11

87

7.1K

Apache TVM retweetledi

vLLM@vllm_project·22 Eki

We are excited about an open ABI and FFI for ML Systems from @tqchenml. In our experience with vLLM, such interop layer is definitely needed!

📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…

English

9

79

9.8K

Apache TVM retweetledi

XGBoost@XGBoostProject·21 Eki

This is a solution that comes out from a lot our early lessons in building XGBoost, an open ABI foundation would definitely help to advance the ecosystem together

📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…

English

2

13

1.1K

Apache TVM retweetledi

Ruihang Lai@ruihanglai·21 Eki

TVM FFI captures the core and foundational insights we’ve gained from years of ML systems research. Can't wait to see such an open ABI enable new possibilities across systems and platforms 🎉

📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…

English

2

14

1.6K

Apache TVM retweetledi

Zhihao Jia@JiaZhihao·22 Eki

Great work! This kind of interoperability will help unlock new cross-compiler optimizations to push kernel performance to the extreme.