
@SemiAnalysis_ ymmv, but I've run 2K scale H200 and B200 runs with 70B model, up to 3D parallel, with regional torch.compile with no issues. Compile is not distributed aware, so the better method imo is regional compile of the transformer blocks, not full model compile:
#L345" target="_blank" rel="nofollow noopener">github.com/pytorch/torcht…
English

