-
701 posts




@levelsio do you stretch pieter? helps me a ton before and after workouts


being obsessed with celebrities is the definition of NPC

🚨WOW: Sam Altman disingenuously cuts off the bottom portion of the answer where Grok lays out the arguments for BOTH candidates and explicitly does NOT choose to give a response. Run the test yourself if you wish to see. Sam Altman lies.






y'all use spotify??? youtube links saved to txt files and custom terminal commands to play, skip, shuffle, etc is where it's at

[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training arxiv.org/pdf/2411.04330. TLDR; - Models become harder to post-train quantize as they are overtrained on lots of data, so that eventually more pretraining data can be actively harmful if quantizing post-training! - The effects of putting weights, activations, or attention in varying precisions during pretraining are consistent and predictable, and fitting a scaling law suggests that pretraining at high (BF16) and next-generation (FP4) precisions may both be suboptimal design choices! Joint work with @ZackAnkner @bfspector @blake__bordelon @Muennighoff @mansiege @CPehlevan @HazyResearch @AdtRaghunathan.















