Kelly Marchisio @NeurIPS

705 posts

Kelly Marchisio @NeurIPS

@cheeesio

Multilingualilty Lead @cohere. Formerly: PhD @jhuclsp, Alexa Fellow @amazon, dev @Google, MPhil @cambridgenlp, EdM @hgse 🔑🔑¬🧀 (@kelvenmar20)

Connecticut, USA 加入时间 Haziran 2019

669 关注2.4K 粉丝

Kelly Marchisio @NeurIPS@cheeesio·30 Mar

sidenote, this comic is 19 years old. How... how did that happen.

English

Kelly Marchisio @NeurIPS@cheeesio·30 Mar

Omg. First day *really* agentic coding. This me rn 🤺⚔️ xkcd.com/303/

English

180

Kelly Marchisio @NeurIPS 已转推

Slator@slatornews·16 Mar

👉 slator.com/generic-reason… While reasoning-enabled #LLMs are among the strongest 🔝 performers in #AI #translation benchmarks, a new study suggests that prompting them to explain their reasoning before translating can hurt translation quality. @rajaee_sara @mziizm @cheeesio @Cohere_Labs @UvA_Amsterdam @cohere #xl8 #t9n

English

295

Kelly Marchisio @NeurIPS@cheeesio·17 Şub

“What language is this text written in?” - a question so easy to pose, yet surprisingly difficult to answer!

EleutherAI@AiEleuther

Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data In collaboration with @CommonCrawl @MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.

English

1.4K

Kelly Marchisio @NeurIPS@cheeesio·17 Şub

@BlancheMinerva Wow this is awesome, and so needed!! A great contribution to the community!

English

Stella Biderman@BlancheMinerva·14 Şub

@cheeesio Our paper just came out :) Would love your feedback: x.com/AiEleuther/sta…

EleutherAI@AiEleuther

English

Stella Biderman@BlancheMinerva·7 Oca

I just got off a call about a forthcoming paper that shows that LLMs are terrible at LangID. Benchmark coming soon :)

Noam Brown@polynoamial

There's a lot of talk of LLMs "saturating all the evals" but there's plenty of evals people could make where LLMs would do poorly: -Beat a Zelda game -Make a profit in a prediction market -Write a stand-up set that's original and funny I'm bullish on AI, but we're far from done.

English

5.1K

Kelly Marchisio @NeurIPS 已转推

Piotr Nawrot@p_nawrot·30 Oca

🌟🚀Sparse Attention Models Can Get Sparser We've updated The Sparse Frontier—the largest empirical analysis of training-free sparse attention to date—from Qwen 2.5 to 3 model families, now including Llama 3.1 and Gemma 3. Key findings: 📊 Larger sparse models outperform smaller dense ones at equal compute cost. Only high-sparsity configs lie on the Pareto frontier for long sequences. 🔬 Already sparse? You can go sparser. Gemma 3 has 5/6 layers as Sliding Window Attention by design—yet additional sparsification of the remaining dense layers still yields efficiency gains at scale. 📈 Longer sequences tolerate higher sparsity. From 9 models × 6 methods × 9 tasks: fixed-budget methods in production are suboptimal. Token budget should grow sublinearly with context length. Co-authors: Robert Li, Renjie Huang, @seb_ruder, @cheeesio, @PontiEdoardo. Special shout-out to @faridlazuarda who updated our repo to vLLM v1 and made Gemma3 evaluations possible. Links in the comments ⬇️

Farid Adilazuarda@faridlazuarda

🚀🚨 Sparse-Frontier Major Updates! You can now evaluate Reasoning + Sparse models at speed, with Sparse-Frontier upgraded to the @vllm_project's v1 engine🔥 We still provide support for Tensor Parallelism and the original sparse attention baselines, but it now works cleanly with newer models, decoding strategies, and evaluation setups. Task coverage and model support were also expanded as part of this release. The config-based workflow stays the same. If you’re working on sparse decoding, reasoning models, or long-context evaluation, this update makes it easier to run consistent experiments across models, tasks, and attention methods⚡️ Really enjoyed working with @p_nawrot and @PontiEdoardo over the past months to get this release out!

English

15.1K

Kelly Marchisio @NeurIPS@cheeesio·22 Oca

Our Multilingual Team at @cohere is hiring interns! If you are a current PhD student working in multilinguality and would like to work with our team, please apply below, reach out! 🌍🌏🌎 (The below says “Winter”, but we hire year-round) jobs.ashbyhq.com/cohere/6e85017…

English

204

14.6K

Kelly Marchisio @NeurIPS@cheeesio·12 Oca

Three emails in the past ~week addressed to “Emma”. This can’t be a coincidence — hive-mind, what is this?

English

305

Kelly Marchisio @NeurIPS 已转推

MT Group at FBK@fbk_mt·18 Ara

Our pick of the week by @dhairya_su47605: "How Does #Quantization Affect #Multilingual #LLMs?" by @cheeesio, @TheyCallMeMr_, Hongyu Chen, @d_aumiller, @ahmetustun89, @sarahookr, @seb_ruder (Findings EMNLP, 2024)

Dhairya Suman@dhairya_su47605

Pick of the week @fbk_mt: How Does Quantization Affect Multilingual LLMs? Quantization has become a widely adopted technique for model compression. This work investigates the impact of quantization on different languages in multilingual LLMs. aclanthology.org/2024.findings-…

English

Kelly Marchisio @NeurIPS 已转推

Dhairya Suman@dhairya_su47605·18 Ara

English

1.2K

Kelly Marchisio @NeurIPS@cheeesio·3 Ara

Announcing - Moms Who ML! 🐣🍼 I landed in San Diego to a video call from my 15-month-old -- she licked the camera then put me in this bag of Mega Bloks👅 If you can relate, let's support one another! Search for the group on the #NeurIPS2025 app, and we'll expand after!

English

919

Kelly Marchisio @NeurIPS@cheeesio·24 Kas

Looking forward to #neurips2025 next week! Come say hi at the @cohere booth!

English

4.2K

Kelly Marchisio @NeurIPS 已转推

David Ifeoluwa Adelani 🇳🇬@davlanade·9 Kas

First invited talk by Kelly Marchisio @cheeesio

English

448

Kelly Marchisio @NeurIPS 已转推

Dwarak@DwaraknathG·16 Eki

I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3

English

455

67.2K

Kelly Marchisio @NeurIPS@cheeesio·3 Eki

Big thanks to @slatornews for hosting me at #SlatorCon Silicon Valley last month. I loved giving attendees an inside look into how SOTA multilingual LLMs are built 🤖🌍

Slator@slatornews

Kelly Marchisio, Multilingual Team Lead at @cohere, shared an inside look at building 🧑‍💻 a #multilingual #LLM and advancing #AI #translation at #SlatorCon Silicon Valley 2025. #Cohere #LLMs #xl8 #t9n @cheeesio slator.com/how-to-build-m…

English

500

Kelly Marchisio @NeurIPS@cheeesio·5 Eyl

I had a great time speaking at #SlatorConSV25 today!

Slator@slatornews

The future of multilingual AI 🚀 is here. At #SlatorConSV25, @cohere's @cheeesio explains how to build massively multilingual LLMs, from technical foundations to the current landscape and what comes next. #MultilingualAI #LLMs #CommandA

English

879

Kelly Marchisio @NeurIPS@cheeesio·29 Ağu

@taneemishere Cohere does hire in multimodality: jobs.ashbyhq.com/cohere/443368a…

English

Taneem Ullah Jan@taneemishere·29 Ağu

@cheeesio do you have position in multimodal too?

English

Kelly Marchisio @NeurIPS@cheeesio·28 Ağu

From the Multilingual Team kitchen straight to you 🧑‍🍳 Enjoy Command A Translate! (we're hiring!)

Cohere@cohere

Introducing Command A Translate, our state-of-the-art model designed for high-quality translation tasks.

English

5.3K

Kelly Marchisio @NeurIPS 已转推

Matthias Gallé@mgalle·28 Ağu

Where machine translation goes, AI goes. This has been true since 1954 (MT made front-page of NYT, 2 years before the Dartmouth workshop) We just released the best translation model ever

Cohere@cohere

Introducing Command A Translate, our state-of-the-art model designed for high-quality translation tasks.

English

1.7K

发现

@rajaee_sara @mziizm @Cohere_Labs @UvA_Amsterdam @cohere @BlancheMinerva @seb_ruder @PontiEdoardo