Sam Acquaviva retweetledi

New NanoGPT Speedrun WR at 86.8 (-0.4s) from @.samacqua on Github, by tuning and reusing the transpose_copy kernel during the cross entropy backward calc. Outside the main speedrun track, Sam did an interesting experiment in Jan showing how test-time training can improve perplexity. github.com/KellerJordan/m…. github.com/KellerJordan/m…
English










