Jonathan Clark
1.6K posts

Jonathan Clark
@JonClarkSeattle
Research Scientist @ Gemini Multilinguality. Learning, Languages, Evals, C++. Previously MT@Microsoft and CMU. Opinions are my own.

Introducing FACTS Grounding. A new benchmark we’re launching with @GoogleDeepMind to evaluate LLM’s factual accuracy on over 1700 tasks. 🧠📐




The AmericasNLP Workshop will be co-located with NAACL on June 21, 2024! ✨✨ We are excited to see you all in Mexico City! More here: 2024.naacl.org/program/worksh…


We're excited to announce DOCCI: A new dataset designed to advance vision-language research. DOCCI features 15k images with detailed descriptions crafted to capture complex visual concepts – spatial relations, counting, text and entities more. arxiv.org/pdf/2404.19753





To detect text written by LMs like #ChatGPT, many methods have recently emerged: DetectGPT, watermarks, GPTZero. We present a paraphrasing attack that can drop their detection rates to <10%. To defend against it, we propose detection with retrieval. arxiv.org/abs/2303.13408 🧵👇

LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties. arxiv.org/pdf/2305.14324…

We all want accurate responses from our QA systems, and this need becomes especially vital when interacting with text in languages unfamiliar to us, rendering answer verification reliant on translation. This challenge is particularly felt by speakers of low-resource languages.

Despite the fantastic progress we've seen recently in cross-lingual modeling, the best systems still make a lot of factual errors. To address this, here is our work on 🚨 Evaluating and Modeling Attribution for Cross-Lingual Question Answering 🚨 #1 Attribution Evaluation: Our work is the first to study attribution for cross-lingual QA. We collect attribution data in 5 languages (Bengali, Finnish Japanese, Russian, and Telugu) With this data, we find that even state-of-the-art cross-lingual open-retrieval QA systems (e.g. CORA) lack attribution. Additionally, we find that passages retrieved cross-lingually contribute only moderately to the attribution level of the system, calling for progress in this area. #2 Attribution Detection Modeling: We experiment with a wide range of attribution detection models to address this issue. We find that NLI models and PaLM 2, fine-tuned on a very small number of attribution examples (~100), reach above 90% accuracy on attribution detection, leading to significantly improving the attribution level of CORA. Attribution is one of the most promising directions to improve trust in NLP systems: Our results show the potential of using attribution detection models to improve it for cross-lingual question answering. Work done while interning at Google Research last summer with @johnwieting2 @JonClarkSeattle @seb_ruder @tmkwiat @liviobs @roeeaharoni @jonherzig @cindyxinyiwang Thanks to @dipanjand, Michael Collins, Vitaly Nikolaev, @jasonriesa, and @pat_verga for supporting the project and to @AkariAsai for fruitful discussions about CORA. Paper available here: arxiv.org/abs/2305.14332

Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models. Submissions: March 15 (it's pronounced "collum" 🕊️)






Excited to announce MADLAD-400 - a 2.8T token web-domain dataset that covers 419 languages(!). Arxiv: arxiv.org/abs/2309.04662 Github: github.com/google-researc… 1/n

