yeswondwerr
2K posts


@stochasticchasm one for arxiv paper threads too that can be indexable and if u post slop u get ban
English

We gotta have a place to aggregate random anecdotal tips and tricks like this
Liliang Ren@liliang_ren
💡 Million-dollar hard lessons we’ve learned from pre-training at 1K GPUs with Standard Parameterization (SP): a. Use fp32 for unembedding weights when training with large vocabulary (200K), if you're using fused linear cross-entropy loss. Otherwise, Grad Norm will keep growing and blow up the loss eventually. (2/4)
English

@EsotericCofe i legitimately find comfy to be 10x harder to use than python, i dont get how people await comfy releases
English

@kalomaze data prep tools for audio is the bottleneck, at least in the open
English

@cloneofsimo >i have a problem
>looking for inspiration on ode relaxation shenanigans
>how are they doing it here arxiv.org/abs/2410.19814
>it resolves small scale physics
English
















