Ilia Kuznetsov

33 posts

Ilia Kuznetsov

@ilokuznetsov

Postdoc at @UKPLab @TUDarmstadt🌳 // peer review, intertextuality, linguistics, NLP applications // synthesizers 🎹

Darmstadt, Germany Katılım Eylül 2009

137 Takip Edilen72 Takipçiler

Ilia Kuznetsov@ilokuznetsov·10 Ara

@BertChakovsky marketing ☝️

English

Bert Chan@BertChakovsky·9 Ara

Galactica: expected science but output bs -> taken down ChatGPT: expected bs and output bs -> big hit

English

358

Ilia Kuznetsov@ilokuznetsov·23 Şub

@ani_nenkova Many studies on peer review analysis in NLP and beyond e.g. score calibration, bias, etc. scholar.google.de/scholar?cites=… Also, the problem is charging $60, and not the language quality score per se; nothing bad about language quality scores for peer review texts?

English

Ani Nenkova@ani_nenkova·21 Şub

Is this what we want to catch up with?

(((ل()(ل() 'yoav))))👾@yoavgo

this is an automatic message you get when submitting a paper to Nature Methods.

English

Ani Nenkova@ani_nenkova·21 Şub

The idea of computational review analysis makes me uneasy. Especially when part of the motivation is that the regular review process takes a lot of time and is expensive. Help me figure out what will be the useful outcome from this line of investigation, apart from publishing

Ilia Kuznetsov@ilokuznetsov

What is this License Task at @ReviewAcl all about? A peek into ethical and sustainable review data collection, quantitative insights from *ACL, and a place for a public discussion, all in our fresh “Yes-Yes-Yes” preprint here: openreview.net/forum?id=28n-0… @DyNils @IGurevych

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ZeerakTalat @ryandcotterell The main issue with consent retraction is dataset stability. For example, if I retract my text, what happens to derivative datasets, models, LMs? Published stats?

English

Anna Rogers@annargrs·8 Şub

@ilokuznetsov @ZeerakTalat @ryandcotterell What about retroactive consent retraction? This is a requirement you have to implement under GDPR anyway. People will be more likely to agree to batch contribution if they know they can retracting individual reviews at any point.

English

Anna Rogers@annargrs·8 Şub

Some thoughts on the first paper/report on peer review data collection at @ARRPreprints. The basic idea is that if authors+reviewers+editors all agree, data goes to a pubic dataset, otherwise a "protected" dataset only available internally. /1

Ilia Kuznetsov@ilokuznetsov

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ZeerakTalat @annargrs @ryandcotterell Yes, we were thinking about this; it's just a bit trickier to implement. For now we have NO (default), or yes to all reviews for a *given iteration*. So if for example I don't want to contribute one of my reviews, I just say NO to all of my reviews for this month.

English

Zeerak (زیرک طلعت) is on Zeerak@ mastodon|bsky@ZeerakTalat·8 Şub

@ilokuznetsov @annargrs @ryandcotterell That opens another issue: the case-by-case consent. People may be inconsistent bc it’s unclear what the data is collected (for). Giving a default response for all of ones reviews, and the ability to change that on a case-by-case basis might get to the consent you’re seeking.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ryandcotterell @ZeerakTalat Good point; retroactive application is definitely not a good idea. We might ask contributors again or leave the data be (or to be used by ACL). Since the process is continuous, the data will get replenished.

English

Anna Rogers@annargrs·8 Şub

@ilokuznetsov @ryandcotterell @ZeerakTalat Can you make such changes retroactively? If not, what will happen with thousands of reviews accumulated until that point?

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ryandcotterell @annargrs @ZeerakTalat That's what we are discussing now. They way it's often done with protected data is that two institutions (e.g. universities) sign an agreement and then distribute NDA agreements to their members.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ryandcotterell @ZeerakTalat Agreed; as soon as we have the workflow for NDA (or whatever mechanism to protect confidentiality/personal data), the license text will need to be adapted.

English

Anna Rogers@annargrs·8 Şub

@ryandcotterell @ilokuznetsov @ZeerakTalat But you still license for specific purposes, right? And they do specify conditions for sublicensing for reviews for the accepted papers, just not for the rejected ones. Yet this whole discussion is about external accessibility of that "internal" data, assuming it's possible.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ryandcotterell @ZeerakTalat Yes by NDA we mean "not sharing the data further and only using it for specific research purposes". It's important to keep track of access to the data; it's both personal data + confidentiality.

English

Anna Rogers@annargrs·8 Şub

@ryandcotterell @ilokuznetsov @ZeerakTalat I'm not a lawyer, but the NDA part is meant as a personal data protection mechanism, right? If people are allowed access to internal data for research, they will publish that research -> ACL can't prevent them from disclosing the very fact that the internal data was shared :)

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ZeerakTalat @annargrs @ryandcotterell Based on our understanding, review scores (no titles, no identities) are not personal information. Review texts are, to the extent any other text is. There is a difference consent vs license. Happy to elaborate here: openreview.net/forum?id=28n-0…

English

Zeerak (زیرک طلعت) is on Zeerak@ mastodon|bsky@ZeerakTalat·8 Şub

@ilokuznetsov @annargrs @ryandcotterell Both.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ZeerakTalat @ryandcotterell @annargrs I think at this point it's really better to move to OpenReview, both for space and to keep track of what is discussed. Twitter is not a good tool for peer review: openreview.net/forum?id=28n-0…

English

Zeerak (زیرک طلعت) is on Zeerak@ mastodon|bsky@ZeerakTalat·8 Şub

@ilokuznetsov @ryandcotterell @annargrs 1) yep. 2) a license that is held by ACL so authors and reviewers have no decision making ability at that junction? Or am I misunderstanding. I hope I am, other wise there’s nothing consensual about this reading of “consent”

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ryandcotterell @annargrs @ZeerakTalat I do not understand the problem with PEER. 1) As per project page, PEER is annotation based. All we need is preprints. ArXiv has plenty. 2) there is textual data on peer review and papers working on that. It's not licensed and has not been collected in a systematic manner.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ZeerakTalat @annargrs @ryandcotterell Do you mean review scores, or do you mean review texts?

English

Zeerak (زیرک طلعت) is on Zeerak@ mastodon|bsky@ZeerakTalat·8 Şub

@ilokuznetsov @annargrs @ryandcotterell Which under GDPR is still personal information so I’m not sure what your point is?

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ryandcotterell Yes, we're actually writing a blog post now; also to have a TL;DR version of the preprint and have more people involved in the discussion. I'll be posting it here once it's online.

English

Anna Rogers@annargrs·8 Şub

@ilokuznetsov @ryandcotterell Ok while that's in progress, maybe there could be a statement clarifying all that, with a timeline estimate + stating that ppl will be able to apply for "protected" data [then] [on conditions], and PEER is in the same boat? Just to avoid the possible perception of a massive COI.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ZeerakTalat @ryandcotterell No, this is for metadata (scores, etc.). For texts we ask for license transfer, that's what the preprint describes.

English

Anna Rogers@annargrs·8 Şub

@ZeerakTalat @ilokuznetsov @ryandcotterell Oh, I didn't know that. Is that all it asks? Informed consent does imply a specific purpose.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ZeerakTalat @ryandcotterell @annargrs Re (1): to clarify, your q is "why would we want a public dataset of peer review texts"; correct? Re (2): that's the metadata consent -- not for texts. Texts are contributed later via license transfer to ACL (see preprint). We're happy to continue here: openreview.net/forum?id=28n-0…

English

Zeerak (زیرک طلعت) is on Zeerak@ mastodon|bsky@ZeerakTalat·8 Şub

@ilokuznetsov @ryandcotterell @annargrs Going back to 1) it’s a bit unclear to me why we would want this. There are some vague mentions of “bias” but I don’t see how anything from this dataset (which suffers from a pretty serious selection bias problem, as acknowledged) helps towards that?

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ZeerakTalat @ryandcotterell For a proper discussion, we have set up public commenting for the pre-print. Anyone with an OR account should be able to post (and outline their concerns/suggestions in more detail than Twitter allows): openreview.net/forum?id=28n-0…

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs @ryandcotterell Yes, absolutely!! But we really need a solution that most people agree on, very clear terms of data sharing, good mechanisms to guarantee protection, and a clear and open option to opt out. That's in progress :)

English

Anna Rogers@annargrs·8 Şub

@ryandcotterell @ilokuznetsov Public data is fair game for anyone, of course. I think there'd be a problem only if the "protected" community data was exclusively accessible to one set of researchers from that community. But @ilokuznetsov says that access will be broadened in the coming months?

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@ryandcotterell @annargrs Two points need clarification. First, the linked project is not about peer reviewing *automation*, it's about assistance and ML-supported analysis. Whether or not this is useful or the best thing to do is a different point -- guidelines and training are very important.

English

Ilia Kuznetsov@ilokuznetsov·8 Şub

@annargrs Yes, we too! ARR is cool because it has less differences between iterations than regular conferences, so one can really measure the effect of interventions, both qualitatively and quantitatively. ETA ~next months. These things go through ARR EiCs, ACL exec, legal team, etc.

English

Anna Rogers@annargrs·8 Şub

@ilokuznetsov Cool. Any ETA on that? I'm not disinterested, admittedly - I've written the reviewer training tutorial, and I really want to know if it made any difference :)

English

Keşfet

@BertChakovsky @ani_nenkova @annargrs @ZeerakTalat @ARRPreprints @elonmusk @BarackObama @taylorswift13