Samuele Cornell

181 posts

Samuele Cornell

@SamueleCornell

Post-doc @ CMU LTI. Audio and speech researcher.

Ancona, Italy Katılım Şubat 2021

526 Takip Edilen984 Takipçiler

Samuele Cornell@SamueleCornell·16 Kas

@ymas0315 It was quite a long work 😅 thanks Yoshiki for the help !

English

まっすー@ymas0315·16 Kas

Samuele's comprehensive review of the CHiME challenges has been published on CSL! "Recent trends in distant conversational speech recognition: A review of CHiME-7 and 8 DASR challenges" sciencedirect.com/science/articl…

English

1.1K

Samuele Cornell retweetledi

Chenda@chenda54·3 Eyl

🚀 Join the ICASSP 2026 URGENT Challenge! Advance Universal, Robust & Generalizable Speech Enhancement. 🗣 Track 1: Universal Speech Enhancement 🎧 Track 2: Speech Quality Assessment 🔗 urgent-challenge.github.io/urgent2026/ #ICASSP2026 #SpeechEnhancement #AI #AudioProcessing

English

3.6K

Samuele Cornell retweetledi

Julius Richter@JuliusRichter13·13 Ara

🗣️ Tomorrow I will be presenting a 𝗗𝗲𝗺𝗼 𝗼𝗻 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗦𝗽𝗲𝗲𝗰𝗵 𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗺𝗲𝗻𝘁. NeurIPS: Saturday, Dec 14, 4:15 pm at West Meeting Room 114, 115 Hope to see you there! Also feel free to try the Demo for yourself: 🔗 github.com/sp-uhh/gen-se-…

English

1.2K

Samuele Cornell@SamueleCornell·14 Ara

If you are still around in Vancouver for Neurips, tomorrow we will have the URGENT challenge workshop from 1.30 pm. Come by if you are interested in generalizable speech enhancement (also wind will be up to 70km/h tomorrow and is cozy inside 😉) Lineup: neurips.cc/virtual/2024/c…

English

1.7K

Samuele Cornell@SamueleCornell·13 Kas

If you are interested in generalizable speech enhancement & restoration consider joining the second edition of the URGENT challenge which will be featured at Interspeech 2025. It will start this Friday !

Shinji Watanabe@shinjiw_at_cmu

We are thrilled to announce the Interspeech 2025 URGENT Challenge, starting on 11/15! Join us in building universal speech enhancement models to tackle in-the-wild speech data using large-scale, multilingual data. Details: urgent-challenge.github.io/urgent2025/

English

861

Samuele Cornell retweetledi

Shinji Watanabe@shinjiw_at_cmu·1 Kas

Hi all, We have one month before the deadline! Please prepare the submission to our special issue "Multi-Speaker, Multi-Microphone, and Multi-Modal Distant Speech Recognition" at Computer Speech & Language. sciencedirect.com/special-issue/…

Shinji Watanabe@shinjiw_at_cmu

We're organizing a special issue at Computer Speech & Language about Multi-Speaker, Multi-Microphone, and Multi-Modal Distant Speech Recognition. Deadline: December 2, 2024 #multi-speaker-multi-microphone-and-multi-modal-distant-speech-recognition" target="_blank" rel="nofollow noopener">sciencedirect.com/journal/comput… @chimechallenge

English

5.8K

Samuele Cornell@SamueleCornell·24 Eki

More details about the "conversational speech" Parakeet TTS model are available here: jordandarefsky.com/blog/2024/para…

English

244

Samuele Cornell@SamueleCornell·23 Eki

In this paper we show that is possible to create synthetic 2-speakers conversations with TTS and LLMs and fine-tune successfully Whisper for multi-speaker ASR generalizing well to real-world scenarios: arxiv.org/abs/2408.09215 Examples of such synth data: popcornell.github.io/SynthConvASRDe…

English

6.6K

Samuele Cornell@SamueleCornell·23 Eki

All credits to @jordandarefsky for the amazing Parakeet TTS model.

English

826

Samuele Cornell@SamueleCornell·23 Eki

3. This works because we fine-tune Whisper with LoRA. Thus the acoustic-level mismatch impact is kinda mitigated. There is still however some mismatch that we need to overcome in order to make the approach more scalable: the performance plateaus after 80h of synth data. @WavLab

English

235

Samuele Cornell@SamueleCornell·23 Eki

2. We show (in our setting) that using a totally synthetic dataset is the same as using legit real-world data but from another domain. E.g. using synthetic data for fine-tuning and then test on mixer6 affords same performance as using Fisher data.

English

216

Samuele Cornell@SamueleCornell·23 Eki

Some takeways: 1. If we use generated LLMs transcripts in place of original target domain transcripts (e.g. Fisher) does not impact much the performance. This can change for some domains where there is significant specialized jargon (healthcare) but we can use LLMs to augment

English

290

Samuele Cornell retweetledi

Paola Garcia@leibnyPaola·1 Eki

⭐⭐ Call for task proposals! CHiME 9! @chimechallenge Come and share your ideas about distant microphone speech processing with us! Task proposals deadline October 25. chimechallenge.org/current/cfp

English

1.6K

Samuele Cornell retweetledi

Neil Zeghidour@neilzegh·18 Eyl

We release a detailed paper, model weights (model and codec) and streaming inference for Moshi! Beyond the model itself, we believe our findings will be useful to audio language models. "Inner Monologue" for the win!

kyutai@kyutai_labs

Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: kyutai.org/Moshi.pdf Repo: github.com/kyutai-labs/mo… HuggingFace: huggingface.co/kmhf

English

8.6K

Samuele Cornell retweetledi

William Chen@chenwanch1·1 Tem

I'm excited to announce @WavLab's XEUS - an SSL speech encoder that covers over 4000+ languages! XEUS is trained on over 1 million hours of speech. It outperforms both MMS 1B and w2v-BERT v2 2.0 on many tasks. We're releasing the code, checkpoints, and our 4000+ lang. data! 🧵

English

221

29.8K

Samuele Cornell@SamueleCornell·23 Haz

@hbredin @fakufakurevenge Too much time on Pyannote 😂

English

Hervé "pyannote" Bredin@hbredin·22 Haz

@SamueleCornell @fakufakurevenge Mine does look terrible compared to both of you!

English

107

Robin Scheibler@fakufakurevenge·22 Haz

Changed the tutors of the tomatoes for longer ones 🍅 The Aiko mini tomatoes look somewhat sickly... 🤒

English

1.3K

Samuele Cornell@SamueleCornell·20 Haz

If you are interested in generalizable speech enhancement that can tackle "speech-in-the-wild" data, different sampling rates and is able to restore audio from different distortions check this out. We have a new challenge at NeurIPS 2024. Website: urgent-challenge.github.io/urgent2024/tim…

Wangyou Zhang@Emrys365

We are thrilled to announce the URGENT 2024 Challenge - a new speech enhancement (SE) competition at NeurIPS 2024: urgent-challenge.github.io/urgent2024 This challenge aims to unify diverse distortions and sampling frequencies using a single universal SE model. #URGENT2024 (1/4)

English

1.9K

Samuele Cornell retweetledi

Shinji Watanabe@shinjiw_at_cmu·18 Haz

Hi all, This is the third call for papers about the SynData4GenAI workshop. Good news! While the submission data was originally due on June 18th, we'll extend it to June 24th. Please submit your papers at syndata4genai.org We look forward to your submissions!

Shinji Watanabe@shinjiw_at_cmu

This is the second call for papers about the SynData4GenAI workshop. Please mark your calendar for the submission due date (June 18, 2024, after the Interspeech acceptance notification)! I'm also pasting the CFP.

English

5.1K

Samuele Cornell@SamueleCornell·16 Haz

@rdesh26 I used a paper sheet with 2 minutes to go mark.

English

163

Desh Raj@rdesh26·15 Haz

I will be in Québec next week, presenting this work at Speaker Odyssey 2024! HMU if you're around 😁 À mardi prochaine! (I'll also be chairing a session for the first time, so send me your best tips for how to make academics stick to a time limit.)

Desh Raj@rdesh26

So far, we only used SURT for transcription, without worrying about speaker labels. In Ch. 7, we show how to jointly perform transcription and streaming speaker attribution in the SURT framework. This work has been submitted to Odyssey'24: arxiv.org/abs/2401.15676 9/n

English

3.3K

Keşfet

@ymas0315 @jordandarefsky @WavLab @chimechallenge @hbredin @fakufakurevenge @elonmusk @BarackObama