Luis Ceze (@luisceze) - Twitter Profili | Zamantika Mersobahis Locabet

Luis Ceze retweetledi

Tianqi Chen@tqchenml·24 Oca

📢#MLSys2026 this year features contest tracks, checkout the anouncement on optimizing FlashInfer-Bench LLM inference kernels for NVIDIA blackwell GPUs 👉

Zihao Ye@ye_combinator

🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥

English

0

11

62

9.3K

Luis Ceze@luisceze·24 Oca

Calling for humans, AI and AI+humans to participate in this contest! Should be super fun.

Zihao Ye@ye_combinator

🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥

English

0

2

8

2K

Luis Ceze retweetledi

Tianqi Chen@tqchenml·28 Kas

CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI

English

9

61

334

53K

Luis Ceze retweetledi

Tony Mongkolsmai@tonymongkolsmai·21 Kas

Today we are releasing our first public beta of Nsight Python! The goal is to simplify the life of a Python developer by proving a pythonic way to analyze your kernel code! Check it out, provide feedback! Nsight Python — nsight-python docs.nvidia.com/nsight-python/

English

10

48

341

29.4K

Luis Ceze@luisceze·22 Eki

FlashInfer Bench’s evaluation of kernels with real-world setups will accelerate development of kernels by both humans and agents - so cool! Can’t wait to see the advances that will come out of it.

Tianqi Chen@tqchenml

🚀Excited to launch FlashInfer Bench. We believe AI has the potential to help build LLM systems . To accelerate the path, we need an open schema for critical workloads and an AI-driven virtuous circle. First-class integration with FlashInfer, SGLang and vLLM support👉

English

0

4

18

4.9K

Luis Ceze retweetledi

Shanli Xing@shanli_xing·21 Eki

🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving workloads - Fastest kernels get day-0 integrated into production First-class integration with FlashInfer, SGLang (@lmsysorg ), and vLLM (@vllm_project ) at launch🙌 Blog post: flashinfer.ai/2025/10/21/fla… Leaderboard: bench.flashinfer.ai

English

3

44

148

59.2K

Luis Ceze retweetledi

Zhihao Jia@JiaZhihao·16 Eki

The #MLSys2026 submission deadline is only 2 weeks away (Oct 30)! Submit your best work on ML systems — spanning hardware, compilers, software, models, agents, and eval. This year features both Research and Industry Tracks! Join us in Seattle next spring! mlsys.org

English

0

14

21

4.4K

Luis Ceze retweetledi

The AI Investor@The_AI_Investor·10 Eki

AMD Instinct MI355X was supposed to compete with NVIDIA Blackwell right? So much for AMD having an advantage in inference.

English

59

70

447

84.5K

Luis Ceze@luisceze·17 Ağu

@gunsnrosesgirl3 Camoufoliage ;)

Français

0

2

125

Science girl@sciencegirl·17 Ağu

Camouflage

Français

41

72

478

43.2K

Luis Ceze retweetledi

Zihao Ye@ye_combinator·14 May

We’re thrilled that FlashInfer won a Best Paper Award at MLSys 2025! 🎉 This wouldn’t have been possible without the community — huge thanks to @lmsysorg’s sglang for deep co-design (which is crtical for inference kernel evolution) and stress-testing over the years, and to @vllm_project for integration support. With continued help from @NVIDIAAIDev , FlashInfer is becoming more stable and faster. Let’s keep building together!

NVIDIA AI Developer@NVIDIAAIDev

🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," just won best paper at #MLSys2025. 🏆 🙌 We are excited to share that we are now backing FlashInfer – a supporter and contributor to the project. We’ve chosen FlashInfer to release our top LLM inference kernels, including those from TensorRT-LLM, making them easy to integrate into @vllm_project, SGLang (@lmsysorg), and custom inference engines. It started as a collaborative research project at @uwcse, @CarnegieMellon, and OctoAI (acquired by NVIDIA) with the goal of creating a flexible LLM inference kernel library that is engine agnostic, highly optimized, and easy to extend for new techniques such as algorithms for KV cache reuse. It is now a thriving open source project with production deployments and contributions from research and development teams across the AI systems community. Check out FlashInfer today to get started to see our first Blackwell kernels for DeepSeek MLA available now: nvda.ws/4djKdq7 Congratulations again to Zihao Ye and all authors of the MLSys paper -- Lequn Chen, Wuwei Lin, Yineng Zhang, Stephanie Wang, Baris Kasikci, Arvind Krishnamurthy, Vinod Grover, Tianqi Chen, Ruihang Lai. And thank you to all community contributions, we look forward to continuing to grow this project. FlashInfer paper: nvda.ws/4kj2Htc Blackwell MLA kernel: nvda.ws/4jWjLW2

English

15

37

233

39.1K

Luis Ceze retweetledi

Ying Sheng@ying11231·14 May

Congrats to @ye_combinator @tqchenml @luisceze! Flashinfer has been the real power behind various inference frameworks! Hope to see more people joining the community and build your own inference engines on top of it!

Zihao Ye@ye_combinator

We’re thrilled that FlashInfer won a Best Paper Award at MLSys 2025! 🎉 This wouldn’t have been possible without the community — huge thanks to @lmsysorg’s sglang for deep co-design (which is crtical for inference kernel evolution) and stress-testing over the years, and to @vllm_project for integration support. With continued help from @NVIDIAAIDev , FlashInfer is becoming more stable and faster. Let’s keep building together!

English

1

4

54

12.4K

Luis Ceze@luisceze·13 May

🚀🎉

NVIDIA AI Developer@NVIDIAAIDev

🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," just won best paper at #MLSys2025. 🏆 🙌 We are excited to share that we are now backing FlashInfer – a supporter and contributor to the project. We’ve chosen FlashInfer to release our top LLM inference kernels, including those from TensorRT-LLM, making them easy to integrate into @vllm_project, SGLang (@lmsysorg), and custom inference engines. It started as a collaborative research project at @uwcse, @CarnegieMellon, and OctoAI (acquired by NVIDIA) with the goal of creating a flexible LLM inference kernel library that is engine agnostic, highly optimized, and easy to extend for new techniques such as algorithms for KV cache reuse. It is now a thriving open source project with production deployments and contributions from research and development teams across the AI systems community. Check out FlashInfer today to get started to see our first Blackwell kernels for DeepSeek MLA available now: nvda.ws/4djKdq7 Congratulations again to Zihao Ye and all authors of the MLSys paper -- Lequn Chen, Wuwei Lin, Yineng Zhang, Stephanie Wang, Baris Kasikci, Arvind Krishnamurthy, Vinod Grover, Tianqi Chen. And thank you to all community contributions, we look forward to continuing to grow this project. FlashInfer paper: nvda.ws/4kj2Htc Blackwell MLA kernel: nvda.ws/4jWjLW2

ART

1

3

10

1.4K

Luis Ceze retweetledi

Mahmoud Soliman@mjsMLP·25 Nis

@0xA95 @seanprime7 @vinodg ‘s work is finally out. Kick the tires and let them know what do you think!

Cristian Garcia@cgarciae88

new JAX MPMD library from Nvidia

English

1

5

709

Luis Ceze retweetledi

Zihao Ye@ye_combinator·11 Mar

LLM is not all about tensor cores. categorical sampling under filters (top-p/top-k/min-p) are critical operators in llms as vocabulary size grows, flashinfer uses sorting-free rejection sampling algorithm for efficient sampling. checkout this great blog post written by @0xsling0 and see how traditional parallel algorithms (e.g. reduction/scan) still shines in the era of llms.

Shanli Xing@shanli_xing

🚀Meet flashinfer.sampling—our sorting-free GPU kernels for lightning-fast #LLM sampling. Our implementation achieves over 50% reduction in sampling time. Blog post: flashinfer.ai/2025/03/10/sam…

English

0

9

39

4.8K

Luis Ceze retweetledi

Shanli Xing@shanli_xing·11 Mar

🚀Meet flashinfer.sampling—our sorting-free GPU kernels for lightning-fast #LLM sampling. Our implementation achieves over 50% reduction in sampling time. Blog post: flashinfer.ai/2025/03/10/sam…

English

1

32

181

31.3K

Luis Ceze retweetledi

Tianqi Chen@tqchenml·7 Mar

Learn more about the latest advances in AI and systems, including LLM serving, efficient attentions, structured outputs, scaling up training, and more topics. Check out #MLSys2025. Accepted papers at mlsys.org/virtual/2025/p… and register today at mlsys.org/Register

English

4

24

103

16.7K

Luis Ceze retweetledi

Zihao Ye@ye_combinator·6 Mar

Check out the intra-kernel profiler in flashinfer to visualize the timeline of each SM/warpgroup in the lifecycle of a CUDA persistent kernel: github.com/flashinfer-ai/… You can clearly understand how tensor/cuda cores overlapping, variable length load-balancing and fusion works.

English

2

32

148

8.7K

Luis Ceze@luisceze·20 Ara

Amazing to see Flashinfer’s traction in the short 8mo since it was first introduced. Try out the latest release.

Zihao Ye@ye_combinator

We are excite to announce FlashInfer v0.2! Core contributions of this release include: - Block/Vector Sparse (Paged) Attention on FlashAttention-3 - JIT compilation for customized attention variants - Fused Multi-head Latent Attention (MLA) decoding kernel - Lots of bugfix and improvements involving CUDAGraph compatibility, RMSNorm/RoPE numerical issue, etc. blog post: flashinfer.ai/2024/12/16/fla…

English

0

3

19

2.6K

Luis Ceze@luisceze·17 Eki

Fascinating to read about this analysis of how telenovelas have such a deep impact on real world culture — I’m brazilian :). As a computer scientist, reading TRIBAL by @MichaelMorrisCU makes me wonder about culture impact on AI and its co-evolution with human culture.

Michael Morris, Professor at Columbia University@MichaelMorrisCU

📺Day 7: Fictional Characters and Real Change 📺 From Will & Grace to Brazilian telenovelas, widely watched dramas can precipitate dramatic cultural shifts. NGOs promoting public health changes have employed serial dramas to shift cultural ideals and personal decisions. But deliberate efforts don’t always work. For example, director Francois Truffaut noted, "There's no such thing as an anti-war movie.” They don’t hurt military recruitment at all. TRIBAL examines the underlying psychology in the signals sent by works of fiction. This helps us understand related questions involving real people who become cultural heroes and role models. Why did consumers follow Roger Federer’s advice to choose Rolex but not LeBon’s advice to choose Samsung? We’ll see that the keys to why some celebrity endorsements sell and others flop also elucidate the influencer campaigns that are increasingly replacing them. Visit tribalbook.org to learn more. #Media #Brazilian #Telenovelas #PublicHealth #CulturalImpact #CelebrityInfluence #BehavioralChange #SocialCampaigns #CulturalPsychology #TRIBAL

English

0

8

664

Luis Ceze@luisceze·25 Tem

Great to see @OctoAICloud only second to @GroqInc -- given our service runs on off-the-cloud-shelf @nvidia hardware. It is all about carefully balancing speed, quality and cost in from a whole-system, cross-stack perspective.

Alex Volkov@altryne

Wanna know whether different LLM providers serve the same LLama 3.1 70B? I sure did! So I ran a quick eval to get some surprising results + open sourced my code 👇 Check out my comparison between @GroqInc @FireworksAI_HQ @OctoAICloud @DeepInfra and @togethercompute

English

1

2

11

7.1K

Luis Ceze

Keşfet