Depth First Learning: "Today, we have @cephaloponderer and @cinjoncin talking about Emily's paper from "

Post

Depth First Learning@DepthFirstLearn·15 Eyl

Today, we have @cephaloponderer and @cinjoncin talking about Emily's paper from July 2020 --> Bringing The People Back In: Contesting Benchmark Machine Learning Datasets. Check it out here: depthfirstlearning.com/2021/Emily-Den…

English

Depth First Learning@DepthFirstLearn·15 Eyl

The paper (arxiv.org/abs/2007.07399) is an interrogation of how datasets in ML are made and their influence. It motivates having genealogical methods for datasets to trace their history and ensure that users are aware of the biases they introduce into downstream applications.

English

Depth First Learning@DepthFirstLearn·15 Eyl

"... the concerns with datasets go much far beyond the statistical properties of who is represented, and that's what we're really trying to do with this paper. The examination of ImageNet both from the categorical and the distributional sides is what sparked our research ..."

English

Depth First Learning@DepthFirstLearn·15 Eyl

"The first Q is trying to understand how dataset developers motivate the decisions that go into the dataset creation. The idea was to read [the dataset artifacts] as texts and understand the values, motivations, and assumptions based on what is said and unsaid within the texts."

English

Depth First Learning@DepthFirstLearn·15 Eyl

"Some interesting patterns which are not not too surprising but are a little disheartening is basically zero papers talking about IRB approval. The only papers that discuss IRB approval processes are review papers. I think only one paper discussed ethical considerations."

English

Depth First Learning@DepthFirstLearn·15 Eyl

"The vast majority of dataset publications don't foreground the dataset as a core contribution. So even though datasets are really fundamental to machine learning, we don't value the construction of datasets like we value algorithmic and modeling contributions."

English

Depth First Learning@DepthFirstLearn·15 Eyl

"There's a history of making these data sets. Well, what are the things that people bring to the table when they do that? If we can understand that, then we can see where the deficiencies are that could lead to things going forward that are just better approaches."

English

Depth First Learning@DepthFirstLearn·15 Eyl

"I would love somebody to take away from this paper that datasets are situated. It's not just the perspectives of the creators but also the socio-technical processes like search engines and the time+place particulars that filter through in the act of creation."

English

Paylaş