C.M. Downey

69 posts

C.M. Downey banner
C.M. Downey

C.M. Downey

@cmdowney

Asst. professor at @UofR Ling and Data Science | NLP for low-resource, endangered, and Indigenous languages | formerly @uwlinguistics, @uwnlp

Rochester, NY Beigetreten Haziran 2019
135 Folgt417 Follower
C.M. Downey
C.M. Downey@cmdowney·
One week (ish) left to apply: I'm recruiting Linguistics PhD students to come work with me on getting NLP to work for as wide a range of languages as possible! Apply below by December 1 sas.rochester.edu/lin/graduate/a…
C.M. Downey tweet mediaC.M. Downey tweet mediaC.M. Downey tweet media
English
1
0
9
870
C.M. Downey
C.M. Downey@cmdowney·
@Olukanni_Jnr Students can apply regardless of undergrad major. However, your Statement of Purpose essay will need to demonstrate clear and tractable ideas for research you want to pursue within (computational) Linguistics. I am also unlikely to take students without programming experience
English
1
0
1
115
Tobby OLUKANNI
Tobby OLUKANNI@Olukanni_Jnr·
@cmdowney Can someone with History as an undergrad course of study apply for this ?
English
1
0
0
132
C.M. Downey
C.M. Downey@cmdowney·
Reminder that I'm recruiting NLP-oriented Linguistics PhD students with a passion for low-resource languages this cycle! Applications are due December 1 (see link). I'll be at #EMNLP2024 next week if anyone wants to chat, so feel free to reach out! sas.rochester.edu/lin/graduate/a…
English
0
5
28
3.2K
C.M. Downey retweetet
Shane Steinert-Threlkeld
Shane Steinert-Threlkeld@ssshanest·
What's the only thing better than NASSLLI (North American Summer School for Logic, Language and Information)? Summer in Seattle! Luckily, you can have both: I'm hosting NASSLLI at UW this summer. nasslli25.shane.st Please share widely, submit proposals, and attend!
Shane Steinert-Threlkeld tweet media
English
2
19
66
6.8K
C.M. Downey
C.M. Downey@cmdowney·
Correction: this cycle I will be recruiting into the Linguistics program ONLY. Students with a CS background can still pursue NLP research with me, but they will take the Ling core curriculum, and an emphasis on less-studied languages is desired. My apologies for any confusion
C.M. Downey@cmdowney

📣 I'm recruiting PhD students this cycle! Researchers interested in expanding NLP for mid- and low-resource languages - and/or developing tools for endangered languages and field linguistics - should apply to work with me either through the UR Ling or CS PhD programs!

English
1
0
13
1.3K
C.M. Downey
C.M. Downey@cmdowney·
Happy to say that this work has been accepted to Findings of #EMNLP2024! Thanks to my fantastic co-authors for getting it across the finish line. I'll probably come to Miami to present it, so come say hi if you find the work interesting!
C.M. Downey@cmdowney

Preprint! We test methods to adapt a crosslingual model to a language family, and argue for targeted multilinguality as a middle ground for low-resource langs, avoiding the "curse of multilinguality" arxiv.org/abs/2405.12413 w/@TerraBlvns, @quirkyDhwani, @dwija_parikh, @ssshanest

English
2
4
34
2.8K
C.M. Downey
C.M. Downey@cmdowney·
3. For this cycle, I'm unlikely to bring on a student interested in machine learning but not NLP / CompLing
English
1
0
1
494
C.M. Downey
C.M. Downey@cmdowney·
Because I can't individually respond to every email: 1. Programming skills (ideally Python) are important for my students in either program 2. For Ling especially, I will strongly weigh interest in endangered languages / fieldwork, to complement existing strengths of the dept ...
C.M. Downey@cmdowney

📣 I'm recruiting PhD students this cycle! Researchers interested in expanding NLP for mid- and low-resource languages - and/or developing tools for endangered languages and field linguistics - should apply to work with me either through the UR Ling or CS PhD programs!

English
1
1
5
1.2K
C.M. Downey
C.M. Downey@cmdowney·
📣 I'm recruiting PhD students this cycle! Researchers interested in expanding NLP for mid- and low-resource languages - and/or developing tools for endangered languages and field linguistics - should apply to work with me either through the UR Ling or CS PhD programs!
English
7
77
255
43K
C.M. Downey
C.M. Downey@cmdowney·
A bit belated, but I finished my PhD! Can't express enough thanks to my amazing advisors @ssshanest and Gina for their investment in my time at UW. Excited for my next adventure of joining the faculty at the University of Rochester!
C.M. Downey tweet mediaC.M. Downey tweet media
English
4
1
23
2.3K
C.M. Downey
C.M. Downey@cmdowney·
Our results suggest new best practices for bootstrapping NLP systems in low-resource language groups. All of our software, results, and analysis can be found on at github.com/CLMBRs/targete…. If you find our work interesting, feel free to reach out and let us know what you think!
English
0
0
1
290
C.M. Downey
C.M. Downey@cmdowney·
Our most surprising result may be that choosing a low sampling alpha (up-sampling low-resource langs and down-sampling high-resource) has a significant beneficial effect for low-resource langs, but does *not significantly harm performance in high-resource ones (pics repeated)
C.M. Downey tweet mediaC.M. Downey tweet media
English
1
0
2
441
C.M. Downey
C.M. Downey@cmdowney·
In fact, adapted vocabulary size does *not have a significant effect on task performance in our lowest-resource languages
C.M. Downey tweet mediaC.M. Downey tweet media
English
1
0
0
226
C.M. Downey
C.M. Downey@cmdowney·
We perform an extensive sweep of adaptation parameters and directly model the effect of adaptation steps, vocab size, and sampling alpha on downstream performance. While steps and vocab size both positively effect performance, doubling steps is ~3x as effective as doubling vocab
C.M. Downey tweet mediaC.M. Downey tweet media
English
1
1
0
268
C.M. Downey
C.M. Downey@cmdowney·
For very under-resourced and endangered languages, our targeted multilingual adaptation is far more effective than adaptation to individual languages
C.M. Downey tweet media
English
1
0
0
239
C.M. Downey
C.M. Downey@cmdowney·
Using the Uralic family as a test case, we adapt XLM-R with targeted language modeling and vocab specialization. Our best models show sizable improvements over multilingual baselines for tasks like UAS, while simultaneously cutting up to 65% of the original parameters
C.M. Downey tweet media
English
1
0
0
242