

Anna Wegmann
83 posts

@anna_wegmann
PhD candidate in NLP @UniUtrecht | Measuring language variation with ML/NLP | now mainly on 🦋 via https://t.co/gpk3bBPSrd





Interested in whether people👂 each other in a conversation? 🚨New paper accepted at #EMNLP2024 with @tyskevdb and @dongng about detecting paraphrases between speakers 🤖 Detect? huggingface.co/AnnaWegmann/Hi… 📊 Analyze? huggingface.co/datasets/AnnaW… 📄 Read? arxiv.org/pdf/2404.06670


👩🏼💻 Real or Robotic? 🤖 Can LLMs accurately simulate qualities of human responses in dialogue? Human conversations with LLMs are great for assessing the capabilities of LLMs. But having lots of folks chat with LLMs is challenging (💰⏳🕵️). Could we have another LLM *simulate* being a human talking to an LLM as a substitute? In our new preprint, we test whether models can roleplay as the human in human-LLM conversations. Using the WildChat dataset and 100K+ simulations we test how well these LLM responses actually mimic with human ones. Our study spans 🇬🇧 English, 🇨🇳 Chinese, and 🇷🇺 Russian, using 21 linguistic metrics like lexical, semantic, syntactic, and stylistic features.













I'm excited to share that the journal version of our paper, "An archival perspective on pretraining data", is now available (open access) from Patterns! This project was led by @MeeraDesai18, along with @IrenePasquetto, @az_jacobs, and myself 1/n









