Dan A. Calian

1 posts

Dan A. Calian

Dan A. Calian

@dancalian

Staff RS @ DeepMind

London, UK Beigetreten Kasım 2010
136 Folgt108 Follower
Dan A. Calian
Dan A. Calian@dancalian·
Dataset curation for language models has long relied on brittle, hand-crafted rules. It's time for a more principled, automated approach. Enter DataRater: a meta-learning framework that learns to value data based on downstream training efficiency. Great summary by Luisa below 👇
Luisa Zintgraf@luisa_zintgraf

Excited to share our new paper, "DataRater: Meta-Learned Dataset Curation"! We explore a fundamental question: How can we *automatically* learn which data is most valuable for training foundation models? Paper: arxiv.org/pdf/2505.17895 to appear @NeurIPSConf Thread 👇

English
0
2
6
820