Bryce Adelstein Lelbach

12K posts

Bryce Adelstein Lelbach

@blelbach

Principal Engineer at @NVIDIA working on programming languages. @adspthepodcast co-host. C++ Library Evolution chair emeritus. Frequent flyer. Horology fan.

Manhattan, NY Katılım Mart 2011

2.7K Takip Edilen17.4K Takipçiler

Sabitlenmiş Tweet

Bryce Adelstein Lelbach@blelbach·10 May

The latest revision of @INCITS/@isostandards COBOL comes out this year The goals of COBOL sound normal today: - Portable - Freely available - Designed by the community In 1959 it was radical & unprecedented It was also conceived of & led by women This is the story of COBOL

English

104

349

Bryce Adelstein Lelbach retweetledi

Chris Lattner@clattner_llvm·18h

Amazing to catch up with @WenmeiHwu, a hero to many of us in the GPU programming space, and who I was lucky to have as an advisor on my PhD committee years ago. Congratulations on the new edition of “Programming Massively Parallel Processors”. Now also in Mojo!

English

441

19.8K

Bryce Adelstein Lelbach@blelbach·18h

I would have accepted your talk, speaking as the former CppCon and C++Now chair. I hope you'll give it another shot.

Dmitrii Kovanikov@ChShersh

My talk proposal was rejected.

English

3.8K

Bryce Adelstein Lelbach retweetledi

tae kim@firstadopter·3d

Seems like Nvidia culture is pretty unique. Someone should write a book about this

tae kim@firstadopter

My latest: Nvidia GTC Day One - Groq Will Be a Grand Slam Homerun taekim.substack.com/p/nvidia-gtc-d…

English

102

19.2K

Bryce Adelstein Lelbach retweetledi

Jump Trading@jumptrading·3d

For 15+ years, Jump Trading has partnered with @nvidia to advance accelerated computing in financial research. Today, we’re deploying NVIDIA’s Vera Rubin NVL72 to support large-scale AI infrastructure. We build for research velocity. Learn more: jumptrading.com/signals/jump-t…

English

39.4K

Bryce Adelstein Lelbach retweetledi

Harley Finkelstein@harleyf·3d

Montreal has the best food scene in the world right now. And it's not even close. Here's @nytimes on Rôtisserie La Lune. Rotisserie chicken. Obsessive craft. Packed every night. One of my favorite restaurants on the planet. And absolutely in Montreal. So proud of @vanyafilipovic, Marco and the whole team. @Montreal is on fire 🔥 nytimes.com/2026/03/17/din…

English

147

105

1.6K

206.7K

Bryce Adelstein Lelbach retweetledi

Nader Khalil🍊@NaderLikeLadder·3d

It was actually super easy to do. Took one email, everyone was super supportive. It’s actually insane how quick this company moves.

Wes Bos@wesbos

Anyone else impressed nvidia got the install script on the main domain, root path? Imagine the meeting of lawyers, security and infra

English

149

18.1K

Bryce Adelstein Lelbach retweetledi

Charles 🎉 Frye@charles_irl·3d

love to see one of my heroes standing in front of our logo -- especially for a good reason, supporting development of cutting-edge blackwell kernels!

Vikram@msharmavikram

@marksaroufim @GPU_MODE @NVIDIAGTC Award ceremony for the nvfp4 kernels. Come hang out at GuildHouse!

English

2.5K

Bryce Adelstein Lelbach retweetledi

Vikram@msharmavikram·3d

@marksaroufim @GPU_MODE @NVIDIAGTC Award ceremony for the nvfp4 kernels. Come hang out at GuildHouse!

English

5.3K

Bryce Adelstein Lelbach retweetledi

Dirhousssi Amine@DirhousssiAmine·4d

GTC2026 Jensen mentions tiles 👀👀 CuTile will be a way bigger deal than we realise. Future hardware will need better software abstraction and tiling is a step in this direction

English

Bryce Adelstein Lelbach retweetledi

Charles 🎉 Frye@charles_irl·4d

tomorrow at 9am! come thru and let's talk about why computers are slow and what is to be done about it nvidia.com/gtc/session-ca…

Charles 🎉 Frye@charles_irl

getting pretty hyped for my talk at GTC! come thru tuesday at 9am to hear how we speed up inference server starts from half an hour to half a minute -- featuring cloud capacity buffers, custom filesystems, and cuda-checkpoint nvidia.com/gtc/session-ca…

English

5.1K

Bryce Adelstein Lelbach@blelbach·3d

The CUDA Tile roadmap: - SIMT/Tile interop. - Comms. - New frontend languages. Come to my talk at GTC in 30 minutes to learn more. nvidia.com/gtc/session-ca…

English

222

9.1K

Bryce Adelstein Lelbach@blelbach·4d

@steeve This is AOT

English

295

Steeve Morin@steeve·5d

mfs will do anything to avoid aot and c++

Matt@matt_dz

cuTile Rust: a safe, tile-based kernel programming DSL for the Rust programming language github.com/NVlabs/cutile-… features a safe host-side API for passing tensors to asynchronously executed kernel functions

English

10.2K

Bryce Adelstein Lelbach@blelbach·4d

@tyler_fong_ The back side is even better!

English

Tyler Fong@tyler_fong_·4d

@blelbach Sick invite

English

121

Bryce Adelstein Lelbach@blelbach·4d

Go to a talk about CUDA or speak to a CUDA developer at GTC 2026, and you might get one of these...

English

3.8K

Bryce Adelstein Lelbach@blelbach·5d

@SuchirKavi @dss_gabriel @NVIDIAGTC It's a research project. One of questions we want feedback on is whether the macro DSL is "good enough" or if we need a full compiler.

English

suchir@SuchirKavi·5d

@blelbach @dss_gabriel @NVIDIAGTC ah, they went macro DSL. and the AST builder looks like it’s not intended for direct use. close enough lol.

English

Bryce Adelstein Lelbach@blelbach·14 Mar

What three languages are joining cuTile Python? Find out Monday at 4PM at @NVIDIAGTC 2026. nvidia.com/gtc/session-ca…

English

4.4K

Bryce Adelstein Lelbach@blelbach·5d

@ashverm4 Are you at UC Berkeley? I'll probably be at LBNL sometime this year. Shoot me an email or DM.

English

Ashvin@ashverm4·5d

@blelbach Wish I could be there! We're trying to target it with our in-house compiler, and it would've been nice to interact the folks who made it

English

Bryce Adelstein Lelbach@blelbach·5d

Learn about the latest developments on CUDA Tile, starting at 3PM on Monday. nvidia.com/gtc/session-ca… nvidia.com/gtc/session-ca…

Matt@matt_dz

English

102

7.1K

Bryce Adelstein Lelbach@blelbach·5d

nvidia.com/gtc/session-ca…

ZXX

2.7K

Bryce Adelstein Lelbach@blelbach·6d

@dss_gabriel @NVIDIAGTC What if you had a GPU kernel programming model that was memory safe by construction? Perhaps some sort of array-oriented model where you don't explicitly program threads or do inter-thread communication.

English

419

Gabriel@dss_gabriel·6d

@blelbach @NVIDIAGTC That’s the catch :( It’s been a while since I’ve last looked at GPGPU in Rust (2023) but back then, Rust just couldn’t allow writing "safe" kernels since every thread would alias as soon as you indexed in a buffer. Idk how far the folks at VectorWare have come w/ Rust-CUDA tho

English

215

Bryce Adelstein Lelbach@blelbach·14 Mar

@ptxpapi x.com/i/status/20325…

Bryce Adelstein Lelbach@blelbach

What three languages are joining cuTile Python? Find out Monday at 4PM at @NVIDIAGTC 2026. nvidia.com/gtc/session-ca…

QME

δ@ptxpapi·11 Mar

@blelbach oh?

δ@ptxpapi

@blelbach I saw Vikram Mailthody mention online that there was a cuTile Rust in development… is this still true? Is there any sort of timeline on a release for this?

189

Bryce Adelstein Lelbach@blelbach·11 Mar

Stay tuned!

AlexZ 🦀@blackanger

刚发现 VectorWare 团队在 2026 年 2 月发布的一篇技术博文中宣布，他们成功在 GPU 上运行了 Rust 的 async/await。 vectorware.com/blog/async-awa… 这解决了传统 GPU 上并发困境。传统 GPU 编程是数据并行，所有线程对不同数据执行相同操作。但随着 GPU 程序变复杂，开发者开始使用 warp specialization（warp 特化），让不同 warp 跑不同的任务（比如一个 warp 负责加载数据，另一个负责计算）。这本质上是从数据并行转向了任务并行。问题在于：这种并发和同步完全靠手动管理，没有语言或运行时层面的支持，就像 CPU 上手写线程同步一样容易出错且难以推理。博客文章梳理了三个已有的高层抽象方案： JAX 把 GPU 程序建模为计算图，编译器分析图中的依赖关系来决定执行顺序和并行策略。 Triton 用"block"作为独立计算单元，通过 MLIR 多层编译管线来管理并发。 CUDA Tile 则引入了"tile"作为一等公民数据单元，让数据依赖变得显式。但这些方案有共同的缺点：它们都要求开发者用全新的方式组织代码，需要新的编程范式和生态，对采用构成显著障碍。 vectorware而且代码复用困难。现有的 CPU 库和 GPU 库都无法直接与这些框架组合。文章的核心论点是 Rust 的 Future trait 恰好满足了他们想要的所有特性： 1. Future 是延迟的、可组合的值。跟 JAX 的计算图类似，你先构建程序的"描述"，再执行。编译器可以在执行前分析依赖关系。 2. Future 天然表达独立的并发单元。跟 Triton 的 block 类似，多个 future 可以串行（.await 链）或并行（join!、组合子）执行。 3. Rust 的所有权系统让数据依赖显式化。跟 CUDA Tile 的显式 tile 类似，future 通过捕获数据来编码数据流向，而 Send/Sync/Pin 等 trait bound 则约束了数据如何在并发单元间共享和传递。 4. 最关键的一点： Warp 特化本质上就是手写的任务状态机，而 Rust 的 future 恰好编译成编译器自动生成和管理的状态机。 vectorware既然 future 只是状态机，没有理由不能在 GPU 上运行。他们移植了 Embassy，一个为嵌入式系统设计的 no_std 异步执行器。GPU 没有操作系统，不支持 Rust 标准库，这跟嵌入式环境非常相似，所以 Embassy 是天然的选择。将 Embassy 适配到 GPU 上只需要很少的修改，这种复用现有开源库的能力远优于其他非 Rust 的 GPU 生态。这篇文章表面上在讲 async/await，但它真正在说的是一个更大的事情：用 Rust 的类型系统和零成本抽象来统一 CPU 和 GPU 的编程模型。 Future 不关心自己跑在哪里，线程、核心、block、warp 都可以。同一段 async 代码可以不改一行在 CPU 和 GPU 上运行。这跟 JAX/Triton 那种"为 GPU 写一套全新的东西"的思路根本不同，是一种从语言层面自底向上的统一。 VectorWare 之前还发过一篇关于在 GPU 上启用 Rust std 的文章，加上这次的 async/await，他们的目标很明确：让 GPU 编程变成"普通的 Rust 编程"，而不是一个需要全新心智模型的特殊领域。

English

103

14.5K

Keşfet

@WenmeiHwu @nvidia @nytimes @vanyafilipovic @Montreal @marksaroufim @GPU_MODE @NVIDIAGTC