Jonathan Clark

1.6K posts

Jonathan Clark banner
Jonathan Clark

Jonathan Clark

@JonClarkSeattle

Research Scientist @ Gemini Multilinguality. Learning, Languages, Evals, C++. Previously MT@Microsoft and CMU. Opinions are my own.

Seattle, WA Katılım Mart 2009
2K Takip Edilen2.5K Takipçiler
Jonathan Clark retweetledi
dynomight
dynomight@dynomight7·
DumPy: Like NumPy except it's OK if you're dum
dynomight tweet media
English
27
82
1.6K
213K
Jonathan Clark retweetledi
Markus Freitag
Markus Freitag@markuseful·
Catch our Google Translate Research team at #EMNLP #WMT24! The team will present 9 papers on step-by-step decoding, mitigating metric bias within MBR decoding (+ MBR dataset release), improved human data collection and automatic metrics (MetricX: winner of WMT Metrics Task).
English
1
1
31
2K
Jonathan Clark retweetledi
Google Canada
Google Canada@googlecanada·
Exciting news! As of today, the Inuit language of Inuktut will be available on Google Translate - marking the first Canadian Indigenous language on the platform. Tunngasugit! | ᑐᙵᓱᒋᑦ (Welcome!) 🎉 Huge thanks to @ITK_CanadaInuit for their invaluable guidance and collaboration. Learn more on our blog: blog.google/intl/en-ca/com…
English
0
23
46
5K
Jonathan Clark retweetledi
iseeaswell꩜bʂky
iseeaswell꩜bʂky@iseeaswell·
Excited to announce that 110 languages got added to Google Translate today! Time for context on these languages, especially the communities who helped a lot over the past few years, including Cantonese, NKo, and Faroese volunteers. Also, a 110-language youtube playlist. 🧵
English
14
56
232
49.5K
Jonathan Clark retweetledi
Jeff Dean
Jeff Dean@JeffDean·
As part of @Google's 1,000 Languages Initiative, a commitment to support the 1,000 most spoken languages, & w/help of our PaLM 2 LLM, we're adding support for 110 new languages (spoken by 614M people) to Google Translate (now supporting 243 languages). 🎉 blog.google/products/trans…
English
42
83
461
134.4K
Jonathan Clark retweetledi
Jing Yu Koh
Jing Yu Koh@kohjingyu·
Absolutely unhinged. When @jasonbaldridge started this in 2021 he would enthusiastically show us weird new images that he took. I thought it was just some weird phase that would fizzle out, but I'm very happy to be wrong, and that it resulted in such a high quality dataset!
Jing Yu Koh tweet media
Yasumasa Onoe@yasumasa_onoe

We're excited to announce DOCCI: A new dataset designed to advance vision-language research. DOCCI features 15k images with detailed descriptions crafted to capture complex visual concepts – spatial relations, counting, text and entities more. arxiv.org/pdf/2404.19753

English
2
13
106
19.9K
Jonathan Clark retweetledi
Markus Freitag
Markus Freitag@markuseful·
New paper alert! Designing reliable human evaluation is both crucial and difficult. Human raters can exhibit different behaviors when rating NLG outputs. These differences are not generally due to a rater performing the task incorrectly, but rather due to differences in harshness or leniency between raters: a Minor error to one rater may be a Major error to another. Consequently, decisions around which raters rate which items can alter the final system ranking. In our new paper, we analyse the impact of rater assignment on the final system ranking and show how you can design a replicable, reliable human evaluation by assigning the right raters to the right items. Take a look: arxiv.org/pdf/2404.01474…
Markus Freitag tweet media
English
2
16
73
8.6K
Jonathan Clark retweetledi
Graham Neubig
Graham Neubig@gneubig·
ACL has removed the anonymity period. This means that ACL submissions can be posted and discussed online at any time, although extensive PR is discouraged. aclweb.org/adminwiki/imag…
Graham Neubig tweet media
English
5
85
343
87.7K
Jonathan Clark retweetledi
John Wieting
John Wieting@johnwieting2·
Today at #NeurIPS2023, If you want to lean more about: 1. Robustness of detectors and watermarks to paraphrase attacks (spoiler alert: needs improvement). 2. An alternative detection approach using simple retrieval methods. and ...
Kalpesh Krishna@kalpeshk2011

To detect text written by LMs like #ChatGPT, many methods have recently emerged: DetectGPT, watermarks, GPTZero. We present a paraphrasing attack that can drop their detection rates to <10%. To defend against it, we propose detection with retrieval. arxiv.org/abs/2303.13408 🧵👇

English
1
4
21
3.7K
Jonathan Clark retweetledi
Jonathan Clark retweetledi
Jonathan Clark retweetledi
Benjamin Muller
Benjamin Muller@ben_mlr·
Excited to be presenting our work on **Evaluating and Modeling Attribution for Cross-Lingual Question Answering** at #EMNLP2023 in Singapore. Updated Paper: arxiv.org/abs/2305.14332 We're also releasing the XOR-AttriQA dataset: github.com/google-researc… 🧵
Benjamin Muller@ben_mlr

Despite the fantastic progress we've seen recently in cross-lingual modeling, the best systems still make a lot of factual errors. To address this, here is our work on 🚨 Evaluating and Modeling Attribution for Cross-Lingual Question Answering 🚨 #1 Attribution Evaluation: Our work is the first to study attribution for cross-lingual QA. We collect attribution data in 5 languages (Bengali, Finnish Japanese, Russian, and Telugu) With this data, we find that even state-of-the-art cross-lingual open-retrieval QA systems (e.g. CORA) lack attribution. Additionally, we find that passages retrieved cross-lingually contribute only moderately to the attribution level of the system, calling for progress in this area. #2 Attribution Detection Modeling: We experiment with a wide range of attribution detection models to address this issue. We find that NLI models and PaLM 2, fine-tuned on a very small number of attribution examples (~100), reach above 90% accuracy on attribution detection, leading to significantly improving the attribution level of CORA. Attribution is one of the most promising directions to improve trust in NLP systems: Our results show the potential of using attribution detection models to improve it for cross-lingual question answering. Work done while interning at Google Research last summer with @johnwieting2 @JonClarkSeattle @seb_ruder @tmkwiat @liviobs @roeeaharoni @jonherzig @cindyxinyiwang Thanks to @dipanjand, Michael Collins, Vitaly Nikolaev, @jasonriesa, and @pat_verga for supporting the project and to @AkariAsai for fruitful discussions about CORA. Paper available here: arxiv.org/abs/2305.14332

English
2
6
22
4.2K
Jonathan Clark retweetledi
Dipanjan Das
Dipanjan Das@dipanjand·
Excited to announce the First Conference on Language Modeling, to be held in approximately a year from now. Please let us know if you are interested or have any feedback on the conference: colmweb.org/survey.html
Sasha Rush@srush_nlp

Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models. Submissions: March 15 (it's pronounced "collum" 🕊️)

English
1
13
73
11.1K
Jonathan Clark retweetledi
Sasha Rush
Sasha Rush@srush_nlp·
Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models. Submissions: March 15 (it's pronounced "collum" 🕊️)
Sasha Rush tweet media
English
29
417
1.7K
505.3K