δ

32 posts

δ banner
δ

δ

@ptxpapi

bare metal alchemist.

cuTile cliffs Entrou em Ocak 2026
23 Seguindo2 Seguidores
Tweet fixado
δ
δ@ptxpapi·
the only reason for taking residence on this godforsaken platform was to learn what quality looks & feels like from the best in our industry. you still have to learn difficult, time consuming, effort draining things, without resorting to llm assisted laziness. i’ll show you.
English
0
0
0
54
δ retweetou
Thorsten Ball
Thorsten Ball@thorstenball·
Lately, whenever I open this app and see the latest tricks, and hacks, and notes, and workflows, and spec here and skill there, I can't help but think: All of this will be washed away by the models. Every Markdown file that's precious to you right now will be gone.
English
98
43
813
103.3K
δ
δ@ptxpapi·
@jamonholmgren i work at one of the largest companies in america / the world by market cap. ~3 years ago, we had a C-Suite level project to use our corporate devices to perform aggregate compute w/ any unused resources. disaster of a project. hardware is cheap — orchestration is expensive.
English
0
0
2
160
Jamon
Jamon@jamonholmgren·
Tech companies pay a lot of money for CI servers and then have a bunch of super powerful Mac hardware sitting idle for 16 hours a night. Hm.
English
21
4
242
27.7K
δ retweetou
evan conrad
evan conrad@evanjconrad·
for basically anything that is not a web thing, the only language that makes sense anymore is rust
English
56
10
191
20.9K
δ
δ@ptxpapi·
welcome to the tech twitter @melibol! super excited to open source contribute & help develop cutile-rs :)
English
0
0
0
19
δ retweetou
Ryo Lu
Ryo Lu@ryolu_·
keep struggling when things come too easy, you don’t exercise the brain nor the emotions. ease can feel like progress, but it often skips the reps that actually change you. growth is usually a loop, not a straight line – you take passes. you try, you fail, you reframe. you come back with a slightly better model, a slightly calmer nervous system, a slightly wider range of what you can handle. hardship isn’t the goal. but friction is gold. it shows you where your understanding is thin, where your habits are brittle, where your ego is doing the steering. the struggle is the curriculum. agents are making things easier, and that’s good. but don’t confuse speed with depth. use AI to remove busywork, then spend the saved energy on the parts that still hurt a little: the unclear problem, the uncomfortable conversation, the hard tradeoffs, the things you can’t yet explain in words. instead of putting all your wishes into the black box, actually keep thinking, and seeing things fully. keep the difficulty where it matters. outsource the tedious, keep the meaningful resistance. that’s how we keep learning – and how we stay human while your tools get superhuman.
English
67
270
2.2K
95.1K
δ
δ@ptxpapi·
apologies in advance, but as someone who was dreading the usage of python based DSLs or having to learn C++, i am about to become incredibly insufferable with my discussions and usage of cuTile’s new DSL. github.com/NVlabs/cutile-…
English
0
0
1
68
δ
δ@ptxpapi·
@blelbach I KNEW IT! on an unrelated note, i love the way the team is thinking about the approach for this — hats off. some of my hpc teammates are flying in for gtc… i’ll be sure to let them know :)
Bryce Adelstein Lelbach@blelbach

@dss_gabriel @NVIDIAGTC What if you had a GPU kernel programming model that was memory safe by construction? Perhaps some sort of array-oriented model where you don't explicitly program threads or do inter-thread communication.

English
0
0
0
34
Bryce Adelstein Lelbach
Bryce Adelstein Lelbach@blelbach·
Stay tuned!
AlexZ 🦀@blackanger

刚发现 VectorWare 团队在 2026 年 2 月发布的一篇技术博文中宣布,他们成功在 GPU 上运行了 Rust 的 async/await。 vectorware.com/blog/async-awa… 这解决了传统 GPU 上并发困境。 传统 GPU 编程是数据并行,所有线程对不同数据执行相同操作。但随着 GPU 程序变复杂,开发者开始使用 warp specialization(warp 特化),让不同 warp 跑不同的任务(比如一个 warp 负责加载数据,另一个负责计算)。这本质上是从数据并行转向了任务并行。 问题在于:这种并发和同步完全靠手动管理,没有语言或运行时层面的支持,就像 CPU 上手写线程同步一样容易出错且难以推理。 博客文章梳理了三个已有的高层抽象方案: JAX 把 GPU 程序建模为计算图,编译器分析图中的依赖关系来决定执行顺序和并行策略。 Triton 用"block"作为独立计算单元,通过 MLIR 多层编译管线来管理并发。 CUDA Tile 则引入了"tile"作为一等公民数据单元,让数据依赖变得显式。 但这些方案有共同的缺点: 它们都要求开发者用全新的方式组织代码,需要新的编程范式和生态,对采用构成显著障碍。 vectorware而且代码复用困难。现有的 CPU 库和 GPU 库都无法直接与这些框架组合。 文章的核心论点是 Rust 的 Future trait 恰好满足了他们想要的所有特性: 1. Future 是延迟的、可组合的值。 跟 JAX 的计算图类似,你先构建程序的"描述",再执行。编译器可以在执行前分析依赖关系。 2. Future 天然表达独立的并发单元。 跟 Triton 的 block 类似,多个 future 可以串行(.await 链)或并行(join!、组合子)执行。 3. Rust 的所有权系统让数据依赖显式化。 跟 CUDA Tile 的显式 tile 类似,future 通过捕获数据来编码数据流向,而 Send/Sync/Pin 等 trait bound 则约束了数据如何在并发单元间共享和传递。 4. 最关键的一点: Warp 特化本质上就是手写的任务状态机,而 Rust 的 future 恰好编译成编译器自动生成和管理的状态机。 vectorware既然 future 只是状态机,没有理由不能在 GPU 上运行。 他们移植了 Embassy,一个为嵌入式系统设计的 no_std 异步执行器。GPU 没有操作系统,不支持 Rust 标准库,这跟嵌入式环境非常相似,所以 Embassy 是天然的选择。将 Embassy 适配到 GPU 上只需要很少的修改,这种复用现有开源库的能力远优于其他非 Rust 的 GPU 生态。 这篇文章表面上在讲 async/await,但它真正在说的是一个更大的事情:用 Rust 的类型系统和零成本抽象来统一 CPU 和 GPU 的编程模型。 Future 不关心自己跑在哪里,线程、核心、block、warp 都可以。同一段 async 代码可以不改一行在 CPU 和 GPU 上运行。 这跟 JAX/Triton 那种"为 GPU 写一套全新的东西"的思路根本不同,是一种从语言层面自底向上的统一。 VectorWare 之前还发过一篇关于在 GPU 上启用 Rust std 的文章,加上这次的 async/await,他们的目标很明确:让 GPU 编程变成"普通的 Rust 编程",而不是一个需要全新心智模型的特殊领域。

English
4
9
103
14.5K
δ retweetou
dax
dax@thdxr·
i really don't care about using AI to ship more stuff it's really hard to come up with stuff worth shipping i want to ship the same amount of stuff with higher quality both in product and code
English
116
135
2.5K
67.4K
δ
δ@ptxpapi·
@MainzOnX thank you for taking the time to explain this! what about Rubin is giving you the feeling that you won’t be able to patch in any new changes as the torch / triton teams have done historically? i’m assuming it’s some anticipation of proprietary hardware - software co-design?
English
1
0
0
36
Adam Mainz
Adam Mainz@MainzOnX·
@ptxpapi Also why i love PyTorch and specifically inductor. We work hard to make all this happen for you in the background. Only move to custom kernels if you really need to
English
1
0
3
60
Adam Mainz
Adam Mainz@MainzOnX·
Something deep in my soul is telling me all our DSLs are going to have trouble once Rubin comes rolling through. Jump feels different than before. Do you think you are ready?
GIF
English
3
2
22
5.7K
δ
δ@ptxpapi·
@MainzOnX do you mind expanding on, for those who aren’t aware of the motivations of the DSL designs? which DSLs & why?
English
1
0
0
70
Adam Mainz
Adam Mainz@MainzOnX·
A lot of DSLs were designed for ampere like architecture in mind originally which from what I see is going to be worlds apart. All the work we have done to move to hopper and Blackwell won’t be enough
English
2
0
4
266
δ
δ@ptxpapi·
the rug will be pulled when then 10x+ subsidies that model building shops run on their products vanish due to profit demands from investors. then what? would you pay your engineering salary to the machine you’ve become reliant on, just to keep your health benefits?
English
0
0
0
34
Dmitriy Kovalenko
Dmitriy Kovalenko@neogoose_btw·
modern software engineering in a nutshell all the software is so unreliable that you literally making a slack channel to monitor critical outages and it gets messages every few hours 😭
Dmitriy Kovalenko tweet media
English
2
0
33
2.8K
δ
δ@ptxpapi·
@dreamsofcode_io perfectly encapsulates the current landscape of agentic hype. none of these trump personal understanding and mastery.
English
0
0
1
56
δ
δ@ptxpapi·
people will truly take any term and tack on “engineering” to it. words have meaning, and denigrating them is a disservice to the truth seeking.
English
0
0
0
20
δ
δ@ptxpapi·
@gabriel1 agreed, but more importantly, there should never be a replacement for understanding every single line of code in your codebase. ignorance is never the answer for progress & innovation, especially for the next generation of developers.
English
0
0
0
7
gabriel
gabriel@gabriel1·
there is still no substitute for perfectly understanding every single line of code in your codebase i fall into the trap of just skimming through ai changes to "just make sure it looks good" all the time, and it makes me lose so much time to not perfectly understand every line
English
160
95
2.7K
248K
δ
δ@ptxpapi·
@arthur_spirling agreed. in discussions with peers, i keep pointing towards the shifting goal posts of “things to learn” — first it’s “prompts”, then it’s “context”, then it’s “mcp”, “a2a”, skills”, the list goes on. these leaky abstractions disappear faster than they materialize.
English
0
0
1
81
Arthur Spirling
Arthur Spirling@arthur_spirling·
If this “AI skill” (using Claude code) is something you picked up in a few days with zero cost, it’s obviously not some moat that is unique to you or uncrossable by everyone else. Stop posting like it is.
English
44
74
2.3K
97.9K
δ
δ@ptxpapi·
@chrisalbon it’s not hate, but rather disappointment on how reckless and dominant the conversation about it’s application has become. it’s in its infancy. it isn’t suitable for replacing learning. it shouldn’t take us away from writing or reviewing code. any other claims are irresponsible.
English
0
0
0
39
Chris Albon
Chris Albon@chrisalbon·
I know some people hate the ai agentic changes in software engineering, but man, I feel I found a second wind in my life.
English
19
4
123
11.1K
δ
δ@ptxpapi·
github outages. cloudflare outages. aws outages. claude outages. nothing worth building at the core of our software infrastructure is worth jeopardizing with the exponentially degrading context windows of poor statistical pattern matching algorithms.
English
0
0
0
176
δ
δ@ptxpapi·
i don’t believe in the agentic ai hype. i can physically feel the haphazardness of ai driven software, and see the persistent low quality design. the future of software belongs to those who reject agentic development, and embrace intentional, quality, craft-worthy development.
English
1
0
0
24