Chris Kitching

@ChrisKitching17

Katılım Mart 2022

2 Takip Edilen2 Takipçiler

Chris Kitching@ChrisKitching17·28 Kas

@HotAisle @SpectralCom We are not here to ask why, merely to make shit work ;)

English

Hot Aisle@HotAisle·28 Kas

@SpectralCom If only the source code was available so that it could be analyzed for why it allows that.

English

115

Spectral Compute@SpectralCom·28 Kas

NVCC's parser is funny. When closing many template arguments at once, you can introduce a redundant comma after every third one with no effect:

English

333

Chris Kitching@ChrisKitching17·26 Eki

@apaszke @clattner_llvm @metaai I think at least part of it is that they seem to have compared against cuBLAS instead of cuBLASLt. The latter is able to optimise for the specific input sizes more than the former, which makes it a fairer comparison with tools like mojo/Triton/etc.

English

466

Adam Paszke@apaszke·26 Eki

@clattner_llvm @metaai How can Mojo be faster than CUDA? Isn’t it really just PTX vs the DSL abstractions? It’s also quite important to consider productivity in addition to perf, although it is harder to quantify

English

11.7K

Chris Lattner@clattner_llvm·26 Eki

Thank you to folks at @metaai for publishing their independent perf analysis comparing CUDA and Mojo against Triton and TileLang DSLs, showing Mojo meeting and beating CUDA, and leaving DSLs in the dust.

English

688

138.1K

Keşfet

@HotAisle @SpectralCom @apaszke @clattner_llvm @metaai @elonmusk @BarackObama @taylorswift13