Kelly Marchisio @NeurIPS

705 posts

Kelly Marchisio @NeurIPS

Kelly Marchisio @NeurIPS

@cheeesio

Multilingualilty Lead @cohere. Formerly: PhD @jhuclsp, Alexa Fellow @amazon, dev @Google, MPhil @cambridgenlp, EdM @hgse 🔑🔑¬🧀 (@kelvenmar20)

Connecticut, USA 加入时间 Haziran 2019
669 关注2.4K 粉丝
Kelly Marchisio @NeurIPS
Kelly Marchisio @NeurIPS@cheeesio·
sidenote, this comic is 19 years old. How... how did that happen.
English
0
0
0
67
Kelly Marchisio @NeurIPS
Kelly Marchisio @NeurIPS@cheeesio·
“What language is this text written in?” - a question so easy to pose, yet surprisingly difficult to answer!
EleutherAI@AiEleuther

Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data In collaboration with @CommonCrawl @MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.

English
0
1
8
1.4K
Stella Biderman
Stella Biderman@BlancheMinerva·
@cheeesio Our paper just came out :) Would love your feedback: x.com/AiEleuther/sta…
EleutherAI@AiEleuther

Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data In collaboration with @CommonCrawl @MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.

English
1
0
1
65
Kelly Marchisio @NeurIPS 已转推
Piotr Nawrot
Piotr Nawrot@p_nawrot·
🌟🚀Sparse Attention Models Can Get Sparser We've updated The Sparse Frontier—the largest empirical analysis of training-free sparse attention to date—from Qwen 2.5 to 3 model families, now including Llama 3.1 and Gemma 3. Key findings: 📊 Larger sparse models outperform smaller dense ones at equal compute cost. Only high-sparsity configs lie on the Pareto frontier for long sequences. 🔬 Already sparse? You can go sparser. Gemma 3 has 5/6 layers as Sliding Window Attention by design—yet additional sparsification of the remaining dense layers still yields efficiency gains at scale. 📈 Longer sequences tolerate higher sparsity. From 9 models × 6 methods × 9 tasks: fixed-budget methods in production are suboptimal. Token budget should grow sublinearly with context length. Co-authors: Robert Li, Renjie Huang, @seb_ruder, @cheeesio, @PontiEdoardo. Special shout-out to @faridlazuarda who updated our repo to vLLM v1 and made Gemma3 evaluations possible. Links in the comments ⬇️
Piotr Nawrot tweet media
Farid Adilazuarda@faridlazuarda

🚀🚨 Sparse-Frontier Major Updates! You can now evaluate Reasoning + Sparse models at speed, with Sparse-Frontier upgraded to the @vllm_project's v1 engine🔥 We still provide support for Tensor Parallelism and the original sparse attention baselines, but it now works cleanly with newer models, decoding strategies, and evaluation setups. Task coverage and model support were also expanded as part of this release. The config-based workflow stays the same. If you’re working on sparse decoding, reasoning models, or long-context evaluation, this update makes it easier to run consistent experiments across models, tasks, and attention methods⚡️ Really enjoyed working with @p_nawrot and @PontiEdoardo over the past months to get this release out!

English
3
12
90
15.1K
Kelly Marchisio @NeurIPS
Kelly Marchisio @NeurIPS@cheeesio·
Our Multilingual Team at @cohere is hiring interns! If you are a current PhD student working in multilinguality and would like to work with our team, please apply below, reach out! 🌍🌏🌎 (The below says “Winter”, but we hire year-round) jobs.ashbyhq.com/cohere/6e85017…
English
5
21
204
14.6K
Kelly Marchisio @NeurIPS
Kelly Marchisio @NeurIPS@cheeesio·
Three emails in the past ~week addressed to “Emma”. This can’t be a coincidence — hive-mind, what is this?
English
0
0
1
305
Kelly Marchisio @NeurIPS 已转推
MT Group at FBK
MT Group at FBK@fbk_mt·
Our pick of the week by @dhairya_su47605: "How Does #Quantization Affect #Multilingual #LLMs?" by @cheeesio, @TheyCallMeMr_, Hongyu Chen, @d_aumiller, @ahmetustun89, @sarahookr, @seb_ruder (Findings EMNLP, 2024)
Dhairya Suman@dhairya_su47605

Pick of the week @fbk_mt: How Does Quantization Affect Multilingual LLMs? Quantization has become a widely adopted technique for model compression. This work investigates the impact of quantization on different languages in multilingual LLMs. aclanthology.org/2024.findings-…

English
0
4
8
1K
Kelly Marchisio @NeurIPS 已转推
Dhairya Suman
Dhairya Suman@dhairya_su47605·
Pick of the week @fbk_mt: How Does Quantization Affect Multilingual LLMs? Quantization has become a widely adopted technique for model compression. This work investigates the impact of quantization on different languages in multilingual LLMs. aclanthology.org/2024.findings-…
English
0
1
5
1.2K
Kelly Marchisio @NeurIPS
Kelly Marchisio @NeurIPS@cheeesio·
Announcing - Moms Who ML! 🐣🍼 I landed in San Diego to a video call from my 15-month-old -- she licked the camera then put me in this bag of Mega Bloks👅 If you can relate, let's support one another! Search for the group on the #NeurIPS2025 app, and we'll expand after!
Kelly Marchisio @NeurIPS tweet media
English
0
0
14
919
Kelly Marchisio @NeurIPS 已转推
Dwarak
Dwarak@DwaraknathG·
I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3
English
14
45
455
67.2K