
Eval4NLP
60 posts

Eval4NLP
@eval4nlp
Workshop on Evaluation and Comparison of NLP Systems, co-located with #AACL2025.


📢📢👇New job openings. Topic: social bias detection+analysis with LLMs across time (1950-now) & languages. There are 2 Post-Doc/PhD positions, supervised by @egere14 (@utn_nuremberg)+Simone Ponzetto (@dwsunima). Fully funded, up to 3 yrs. More infos: nl2g.github.io/positions


Excited to present our paper "BMX: Boosting Natural Language Generation Metrics with Explainability" at #EACL2024! Join us in Virtual Poster Session B on 20.03.2024 at 2 p.m. as we unveil how explanations can enhance NLG evaluation metrics.








DALLE-3 is the best product I've seen since GPT-4, super easy to just get sucked in for hours generating images. No need for prompting since GPT-4 does it for you. Let me know if you have requests for prompts below. Here are some examples of what it can do:











LLMs still lag behind our best metrics for MT evaluation. But what if we prompted them for fine-grained, interpretable feedback (much like human annotators)? arxiv.org/abs/2308.07286 TLDR: We analyzed their capabilities for MT eval, and propose *AutoMQM* to improve them! 1/14





