VectorWare

12 posts

VectorWare

VectorWare

@vectorware

Присоединился Ağustos 2025
0 Подписки385 Подписчики
VectorWare
VectorWare@vectorware·
@AgileJebrim No worries, Rust is not for everyone! We're pretty bullish on github.com/nvidia/stdexec which maps well to our async/await work and CUDA Tile. Someone will likely take what we did here for Rust threads and do it for C++ to explore the tradeoffs.
English
1
0
0
110
Jebrim
Jebrim@AgileJebrim·
@vectorware Yeah I see the heavy insistence on Rust here too. I would personally pass, sorry, but I’ll follow to see where you guys end up. vectorware.com/jobs/
English
1
0
1
100
VectorWare
VectorWare@vectorware·
We are excited to announce that we can successfully use Rust's std::thread on the GPU. This has never been done before. vectorware.com/blog/threads-o… Supporting Rust's std::thread enables existing Rust code to work on the GPU and makes GPU programming more ergonomic.
English
17
100
638
38.6K
VectorWare
VectorWare@vectorware·
@AgileJebrim Agreed, software architected for the GPU will always be better than CPU software ported over. We've done this work mainly to use existing CPU libraries in GPU-native apps where it makes sense (adding some GPU-specific logic in places for perf).
English
1
0
1
134
Jebrim
Jebrim@AgileJebrim·
Being GPU-native is a great goal and I’ve been doing it for years, but starting from a CPU-based API that cannot even leverage the SIMD lanes within each wavefront doesn’t sound very GPU-native to me. It’s just programming the GPU with CPU-style scalar code, losing out on a magnitude’s worth of performance potential. I’m sure it’ll still perform better than typical multithreaded CPU code though, especially with the better bandwidth that’s available in GPUs, but you’re still just emulating a CPU on top of the GPU rather than truly being GPU-native and leveraging a wider range of the hardware capabilities of GPUs. Interesting idea though to at least improve some of the existing codebases out there without requiring a big rewrite.
English
2
0
8
630
VectorWare
VectorWare@vectorware·
@AgileJebrim Yep, future posts will talk about SIMD lanes within each warp. Check the pedantic note on "first" if you haven't!
English
1
1
3
729
Jebrim
Jebrim@AgileJebrim·
@vectorware Just skimmed this but it appears you’re doing warp-uniform behavior, not leveraging the SIMD lanes within each warp? “At VectorWare, we are building the first GPU-native software company.” I can absolutely say you’re not the first. :P
English
3
1
15
1.5K
VectorWare
VectorWare@vectorware·
@Leik0w0 Yep, that is the direction we've been experimenting with and will be talking about in a future post
English
0
0
6
913
Léo
Léo@Leik0w0·
@vectorware Very interesting work! I like the way you enforce truly independent work to be run on different warps. How would you model programming the lanes inside each warp though ? A simd like model ?
English
1
0
5
1.2K
John Carmack
John Carmack@ID_AA_Carmack·
The glory work of GPU scheduling is in the frontier data centers with hundreds of thousands of GPUs, but a lot of research work is done with single GPU jobs on modest clusters, and the scheduling leaves much to be desired. I wish there were a clean way to preempt GPU tasks, so long running tasks could be transparently paused to allow higher priority tasks to get the minimum time-to-results. Manual checkpointing and cooperative multitasking is an option, but it complicates codebases and is fertile ground for bugs. It feels like most of the pieces are present: Everything goes through page tables on the GPUs already, Nvidia UVM (Unified Virtual Memory) allows demand paging to host memory, and MPS (Multi-Process Service) could act as a CUDA shim to force everything to use a different memory allocator. Memory page thrashing would be catastrophic for GPU tasks, but the idea would be to pause the host task of the low priority process, then let the high priority process force only the necessary pages out (or maybe none at all, if the memory pressure wasn’t high enough) while it is running, then resume the low priority task on completion, allowing it to page everything back in. Task switching at the level of tens of seconds, not milliseconds. Even if it didn’t handle absolutely all memory (kernel allocations and such) and had some overhead, that would be quite useful. Of course, Nvidia would prefer you to Just Buy More GPUs!
English
70
70
1.2K
98.2K
VectorWare
VectorWare@vectorware·
We are excited to announce that we can successfully use Rust's async/await on the GPU. This has never been done before. vectorware.com/blog/async-awa… Supporting Rust's async/await (and futures) enables existing Rust code to work on the GPU and makes GPU programming more ergonomic.
English
5
14
62
4K
VectorWare
VectorWare@vectorware·
@nazarpc We got an unmodified coremark runnning on the GPU and a GPU warp is surprisingly competitive with a CPU core
English
1
0
1
131
Nazar Mokrynskyi
Nazar Mokrynskyi@nazarpc·
@vectorware Interesting experiment, but I am still skeptical about running general purpose CPU code on GPUs efficiently. From my experience GPUs like things in a very particular way and deviation leaves a lot of performance on the table. Looking forward to more technical details.
English
1
0
3
152
VectorWare
VectorWare@vectorware·
We are excited to announce that we can successfully use Rust's standard library from the GPU. This has never been done before. vectorware.com/blog/rust-std-… Supporting Rust's standard library enables existing Rust code to work on the GPU and makes GPU programming feel normal.
English
14
17
62
3.7K
VectorWare
VectorWare@vectorware·
Hello world! We are building the first GPU-native software company. Today we are sharing the thesis, people, and partners behind it. vectorware.com/blog/announcin…
English
4
5
17
2.3K