
Pretraining corpus was assembled using American and British books and newspapers prior to Jan. 1st, 1900, sourced from Huggingface and Internet Archive. After extensive filtering, ~22 billion tokens were compiled into a training corpus. The best checkpoint was a 3.3 billion parameter model trained for ~5.5e20 FLOPs.



























