

Instruction Workshop, NeurIPS 2023
197 posts

@itif_workshop
The official account of the 1st Workshop on Instruction Tuning and Instruction Following (ITIF), colocated with NeurIPS, in December 2023.














📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by language? 🧙♂️Can we model the curse of multilinguality? ⚖️Pretrain from scratch or finetune from multilingual checkpoint? 🔀Cross-lingual transfer scores for 1444 lang pairs? 1/🧵

📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by language? 🧙♂️Can we model the curse of multilinguality? ⚖️Pretrain from scratch or finetune from multilingual checkpoint? 🔀Cross-lingual transfer scores for 1444 lang pairs? 1/🧵


📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by language? 🧙♂️Can we model the curse of multilinguality? ⚖️Pretrain from scratch or finetune from multilingual checkpoint? 🔀Cross-lingual transfer scores for 1444 lang pairs? 1/🧵


Q2: Which languages actually help each other during training? And how much? 🌟Answer: We measure this empirically. We built a 38×38 transfer matrix, or 1,444 language pairs—the largest such resource to date. We highlight the top 5 most beneficial source languages for each target language. 4/















