Dan A. Calian (@dancalian) - Twitter-Profil | Zamantika Mersobahis Locabet

Dataset curation for language models has long relied on brittle, hand-crafted rules. It's time for a more principled, automated approach. Enter DataRater: a meta-learning framework that learns to value data based on downstream training efficiency. Great summary by Luisa below 👇

Luisa Zintgraf@luisa_zintgraf

Excited to share our new paper, "DataRater: Meta-Learned Dataset Curation"! We explore a fundamental question: How can we *automatically* learn which data is most valuable for training foundation models? Paper: arxiv.org/pdf/2505.17895 to appear @NeurIPSConf Thread 👇

English

820

Dan A. Calian

Entdecken