
johnkellden
33K posts

johnkellden
@johnkellden
Cards catalyzing stories, Conversations that mind and matter, Digital communities and collaborative narratives






[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training arxiv.org/pdf/2411.04330. TLDR; - Models become harder to post-train quantize as they are overtrained on lots of data, so that eventually more pretraining data can be actively harmful if quantizing post-training! - The effects of putting weights, activations, or attention in varying precisions during pretraining are consistent and predictable, and fitting a scaling law suggests that pretraining at high (BF16) and next-generation (FP4) precisions may both be suboptimal design choices! Joint work with @ZackAnkner @bfspector @blake__bordelon @Muennighoff @mansiege @CPehlevan @HazyResearch @AdtRaghunathan.








Dems don’t have strong incentives to win, just strong incentives to stay on the good side of their donors. There won’t be any realignment unless those incentives change, which would require a Tea Party-style insurrection & internal power shift. Seems extremely unlikely

Drone supremacy is the new nuclear supremacy.







