Eval4NLP

60 posts

Eval4NLP

@eval4nlp

Workshop on Evaluation and Comparison of NLP Systems, co-located with #AACL2025.

Katılım Ekim 2019

36 Takip Edilen303 Takipçiler

Eval4NLP@eval4nlp·28 Tem

📢📢 This year Eval4NLP is co-located with #aacl2025. Our CFP is now out: eval4nlp.github.io/2025/cfp.html Paper submission deadline: September 29, 2025. Direct submission via OpenReview. ARR commitment deadline: October 27, 2025 Notification of acceptance: November 3, 2025

English

405

Eval4NLP retweetledi

NLLG@NLLG_lab·23 Haz

Two more days left to apply 👇👇 x.com/NLLG_lab/statu…

NLLG@NLLG_lab

📢📢👇New job openings. Topic: social bias detection+analysis with LLMs across time (1950-now) & languages. There are 2 Post-Doc/PhD positions, supervised by @egere14 (@utn_nuremberg)+Simone Ponzetto (@dwsunima). Fully funded, up to 3 yrs. More infos: nl2g.github.io/positions

English

1.1K

Eval4NLP retweetledi

NLLG@NLLG_lab·18 Eki

📢📢📢The NLLG lab has three new open fully funded PhD positions: 1⃣ - Next Generation LLMs 2⃣ - NLP for Science 3⃣ - Multimodal evaluation metrics Deadlines: End of October ℹ️More information: nl2g.github.io/positions

English

Eval4NLP retweetledi

NLLG@NLLG_lab·20 Mar

📢📢👉👉@chrleiter presenting his work on how explainability can improve evaluation metrics for MT and summarization tomorrow at #eacl2024 twitter.com/ChrLeiter/stat…

Christoph Leiter@ChrLeiter

Excited to present our paper "BMX: Boosting Natural Language Generation Metrics with Explainability" at #EACL2024! Join us in Virtual Poster Session B on 20.03.2024 at 2 p.m. as we unveil how explanations can enhance NLG evaluation metrics.

English

367

Eval4NLP@eval4nlp·3 Kas

Eval4NLP23 has concluded. We thank everyone + congratulate our shared task winners on inducing high-quality metrics for MT+summ. using prompting and efficient models: "HIT-MI&T Lab" (even beating GEMBA + COMET🚀) & "DSBA". Shared task overview paper: arxiv.org/pdf/2310.19792…

English

Eval4NLP@eval4nlp·31 Eki

Our accepted papers and program are now online: eval4nlp.github.io/2023/program.h… eval4nlp.github.io/2023/accepted-… Moreover, we're excited to have @alexfabbri4 as invited speaker on the topic of "Re-Evaluating Summarization Evaluation in the Era of LLMs" See u tomorrow 9am (UTC+8), online only!

English

2.1K

Eval4NLP@eval4nlp·27 Eyl

📢📢To accommodate the recent ARR author response period, Eval4NLP @aaclmeeting extends the deadline for pre-reviewed papers until September 30th. Pre-reviewed papers must include: the paper along with its original reviews and scores. More details: eval4nlp.github.io

English

719

Eval4NLP retweetledi

will depue@willdepue·21 Eyl

Round #2 of DALLE images, using requests from y'all. Here's two animals mixed together into a new species.

will depue@willdepue

DALLE-3 is the best product I've seen since GPT-4, super easy to just get sucked in for hours generating images. No need for prompting since GPT-4 does it for you. Let me know if you have requests for prompts below. Here are some examples of what it can do:

English

300

109.2K

Eval4NLP@eval4nlp·22 Eyl

📢📢 Don't forget: Pre-reviewed papers can be submitted to Eval4NLP @aaclmeeting until September 25 via Openreview Just include the paper and your meta-reviews along with reviews and all the scores. eval4nlp.github.io

English

1.2K

Eval4NLP@eval4nlp·1 Eyl

@_danieldeutsch Interesting work! Just pointing out: DiscoScore might be (considerably) stronger than Blonde as a doc-level metric. aclanthology.org/2023.eacl-main…

English

Dan Deutsch@_danieldeutsch·29 Ağu

Interested in document-level MT but have been held back by the lack of automatic metrics? If so, you won't want to miss our new paper! We study the quality of sentence-level metrics on long-form text and augment them with paragraph-level training data. arxiv.org/abs/2308.13506

English

5.7K

Eval4NLP@eval4nlp·29 Ağu

@gg42554 @ReviewAcl The reviewing quality is bad in NLP, agreed. But it's also often because there are so many junior people (who sometimes need to step in because there aren't enough reviewers). Exposing them publicly may also be problematic for various reasons.

English

529

Goran Glavaš@gg42554·28 Ağu

Writing the EMNLP rebuttals. I'm now convinced (also after having served for a year as EiC for @ReviewAcl) that nothing short of publicly releasing reviews *with reviewer identities* will substantially improve the (currently appaling) average review quality in #NLProc.

English

3.8K

Eval4NLP@eval4nlp·24 Ağu

Due to popular demand, the Eval4NLP workshop @ @aaclmeeting submission deadline has been moved to September 1. We look forward to your submissions! 📣📣 More infos: eval4nlp.github.io

English

1.8K

Eval4NLP@eval4nlp·22 Ağu

📢📢Brief reminder that Eval4NLP submission deadline is this **Friday, 25.08.2023** Focus: Evaluation with/of LLMs, all other evaluation aspects also very welcome 2nd call4papers: aclweb.org/portal/content… Webpage: eval4nlp.github.io/2023/index.html Venue: @aaclmeeting (Bali 🏖️)

English

1.6K

Eval4NLP retweetledi

juri@nlopitz·21 Ağu

Shared task on explainable evaluation metrics: The development phase is running, but participation is still easily possible, there's also no necessity to train systems to take part. #nlproc #machinelearning Find out more @nlp_evaluation: eval4nlp.github.io/2023/shared-ta…

English

442

Eval4NLP@eval4nlp·18 Ağu

@markuseful AutoMQM would be a great candidate for the Eval4NLP shared task on explainaple metrics :) eval4nlp.github.io/2023/shared-ta…

English

Markus Freitag@markuseful·18 Ağu

Many of you asked me the question about an automatic metric that can give us similar insights as MQM. We (mostly Patrick -- hands down one of the best student researchers I ever worked with) investigated how well LLMs can do MQM like error annotation. We present ... 🥳AutoMQM🥳

Patrick Fernandes@psanfernandes

LLMs still lag behind our best metrics for MT evaluation. But what if we prompted them for fine-grained, interpretable feedback (much like human annotators)? arxiv.org/abs/2308.07286 TLDR: We analyzed their capabilities for MT eval, and propose *AutoMQM* to improve them! 1/14

English

5.4K

Eval4NLP@eval4nlp·3 Ağu

📢📢📢 We have released the description of Eval4NLP's shared task on "Prompting LLMs as Explainable Evaluation Metrics" (for MT & summarization). Dev phase: Aug. 7 Test phase: Sep. 18 System Submission Deadline: Sep. 23 More details: eval4nlp.github.io/2023/shared-ta… 🚀🚀🚀

English

1.1K

Eval4NLP@eval4nlp·24 Tem

📢📢📢 The Eval4NLP workshop will take place this year at AACL 2023. Special focus: Evaluation of/with LLMs. Including a shared task on Prompting LLMs as Explainable Metrics. 📢📢📢 Direct submission deadline: 25.08. Webpage: eval4nlp.github.io/2023/index.html CFP: eval4nlp.github.io/2023/cfp.html

English

3.1K

Eval4NLP@eval4nlp·20 Kas

Eval4NLP is happening tomorrow, starting at 10:30 UTC. Our combined program with @sum_eval includes some great paper presentations and invited talks from @peyrardmax and @rktamplayo. See you there tomorrow! eval4nlp.github.io/2022/program.h…

English

Eval4NLP@eval4nlp·16 Eki

The list of accepted papers is now on our website! eval4nlp.github.io/2022/accepted-… Congratulations to all of the authors, and we look forward to your presentations on November 20. FYI, registration for AACL is now open! aacl2022.org/Registration

English

Eval4NLP@eval4nlp·14 Eyl

This year, Eval4NLP is accepting submissions of papers with reviews from other venues (see the CFP for more details #ARR" target="_blank" rel="nofollow noopener">eval4nlp.github.io/2022/cfp.html#…). Submit your paper and reviews here openreview.net/group?id=aclwe… by September 21, AOE!

English

Keşfet

@chrleiter @alexfabbri4 @aaclmeeting @_danieldeutsch @gg42554 @ReviewAcl @markuseful @elonmusk