Sergei Lebedev 리트윗함

Want to improve GPU compute/comms overlap? We just published a new short tutorial for you!
A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute: docs.jax.dev/en/latest/pall…
English





















