Post

Complex Data Lab McGill
Complex Data Lab McGill@complexDataLab·
💡 Strong data and eval are essential for real-world progress. In "A Guide to Misinformation Detection Data and Evaluation"—to be presented at KDD 2025—we conduct the largest survey to date in this domain: 75 datasets curated, 45 accessible ones analyzed in depth. Key findings👇
Complex Data Lab McGill tweet media
English
1
4
3
176
Complex Data Lab McGill
Complex Data Lab McGill@complexDataLab·
📊Severe spurious correlations and ambiguities affect the majority of datasets in the literature. For example, most datasets have many examples where one can’t conclusively assess veracity at all.
Complex Data Lab McGill tweet media
English
1
0
0
21
Complex Data Lab McGill
Complex Data Lab McGill@complexDataLab·
🔍 Categorical labels can underestimate the performance of generative systems by massive amounts: half the errors or more.
Complex Data Lab McGill tweet media
English
1
0
0
19
Complex Data Lab McGill
Complex Data Lab McGill@complexDataLab·
🛠️ We also provide practical tools: • CDL-DQA: a toolkit to assess misinformation datasets • CDL-MD: the largest misinformation dataset repo, now on Hugging Face 🤗
English
1
0
0
20
Complex Data Lab McGill
Complex Data Lab McGill@complexDataLab·
🚀 Given these challenges, error analysis and other simple steps could greatly improve the robustness of research in the field. We propose a lightweight Evaluation Quality Assurance (EQA) framework to enable research results that translate more smoothly to real-world impact.
English
1
0
0
48
Teilen