Samuele Cornell

181 posts

Samuele Cornell

Samuele Cornell

@SamueleCornell

Post-doc @ CMU LTI. Audio and speech researcher.

Ancona, Italy Katılım Şubat 2021
526 Takip Edilen984 Takipçiler
Samuele Cornell
Samuele Cornell@SamueleCornell·
@ymas0315 It was quite a long work 😅 thanks Yoshiki for the help !
English
1
0
2
71
まっすー
まっすー@ymas0315·
Samuele's comprehensive review of the CHiME challenges has been published on CSL! "Recent trends in distant conversational speech recognition: A review of CHiME-7 and 8 DASR challenges" sciencedirect.com/science/articl…
English
1
4
14
1.1K
Samuele Cornell retweetledi
Julius Richter
Julius Richter@JuliusRichter13·
🗣️ Tomorrow I will be presenting a 𝗗𝗲𝗺𝗼 𝗼𝗻 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗦𝗽𝗲𝗲𝗰𝗵 𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗺𝗲𝗻𝘁. NeurIPS: Saturday, Dec 14, 4:15 pm at West Meeting Room 114, 115 Hope to see you there! Also feel free to try the Demo for yourself: 🔗 github.com/sp-uhh/gen-se-…
English
2
2
23
1.2K
Samuele Cornell
Samuele Cornell@SamueleCornell·
If you are still around in Vancouver for Neurips, tomorrow we will have the URGENT challenge workshop from 1.30 pm. Come by if you are interested in generalizable speech enhancement (also wind will be up to 70km/h tomorrow and is cozy inside 😉) Lineup: neurips.cc/virtual/2024/c…
English
0
5
13
1.7K
Samuele Cornell
Samuele Cornell@SamueleCornell·
If you are interested in generalizable speech enhancement & restoration consider joining the second edition of the URGENT challenge which will be featured at Interspeech 2025. It will start this Friday !
Shinji Watanabe@shinjiw_at_cmu

We are thrilled to announce the Interspeech 2025 URGENT Challenge, starting on 11/15! Join us in building universal speech enhancement models to tackle in-the-wild speech data using large-scale, multilingual data. Details: urgent-challenge.github.io/urgent2025/

English
0
0
14
861
Samuele Cornell retweetledi
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
Hi all, We have one month before the deadline! Please prepare the submission to our special issue "Multi-Speaker, Multi-Microphone, and Multi-Modal Distant Speech Recognition" at Computer Speech & Language. sciencedirect.com/special-issue/…
Shinji Watanabe@shinjiw_at_cmu

We're organizing a special issue at Computer Speech & Language about Multi-Speaker, Multi-Microphone, and Multi-Modal Distant Speech Recognition. Deadline: December 2, 2024 #multi-speaker-multi-microphone-and-multi-modal-distant-speech-recognition" target="_blank" rel="nofollow noopener">sciencedirect.com/journal/comput… @chimechallenge

English
0
12
40
5.8K
Samuele Cornell
Samuele Cornell@SamueleCornell·
3. This works because we fine-tune Whisper with LoRA. Thus the acoustic-level mismatch impact is kinda mitigated. There is still however some mismatch that we need to overcome in order to make the approach more scalable: the performance plateaus after 80h of synth data. @WavLab
English
0
0
0
235
Samuele Cornell
Samuele Cornell@SamueleCornell·
2. We show (in our setting) that using a totally synthetic dataset is the same as using legit real-world data but from another domain. E.g. using synthetic data for fine-tuning and then test on mixer6 affords same performance as using Fisher data.
English
0
0
0
216
Samuele Cornell
Samuele Cornell@SamueleCornell·
Some takeways: 1. If we use generated LLMs transcripts in place of original target domain transcripts (e.g. Fisher) does not impact much the performance. This can change for some domains where there is significant specialized jargon (healthcare) but we can use LLMs to augment
English
0
0
0
290
Samuele Cornell retweetledi
Neil Zeghidour
Neil Zeghidour@neilzegh·
We release a detailed paper, model weights (model and codec) and streaming inference for Moshi! Beyond the model itself, we believe our findings will be useful to audio language models. "Inner Monologue" for the win!
kyutai@kyutai_labs

Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: kyutai.org/Moshi.pdf Repo: github.com/kyutai-labs/mo… HuggingFace: huggingface.co/kmhf

English
4
11
90
8.6K
Samuele Cornell retweetledi
William Chen
William Chen@chenwanch1·
I'm excited to announce @WavLab's XEUS - an SSL speech encoder that covers over 4000+ languages! XEUS is trained on over 1 million hours of speech. It outperforms both MMS 1B and w2v-BERT v2 2.0 on many tasks. We're releasing the code, checkpoints, and our 4000+ lang. data! 🧵
William Chen tweet media
English
7
57
221
29.8K
Robin Scheibler
Robin Scheibler@fakufakurevenge·
Changed the tutors of the tomatoes for longer ones 🍅 The Aiko mini tomatoes look somewhat sickly... 🤒
Robin Scheibler tweet media
English
1
0
17
1.3K
Samuele Cornell
Samuele Cornell@SamueleCornell·
If you are interested in generalizable speech enhancement that can tackle "speech-in-the-wild" data, different sampling rates and is able to restore audio from different distortions check this out. We have a new challenge at NeurIPS 2024. Website: urgent-challenge.github.io/urgent2024/tim…
Wangyou Zhang@Emrys365

We are thrilled to announce the URGENT 2024 Challenge - a new speech enhancement (SE) competition at NeurIPS 2024: urgent-challenge.github.io/urgent2024 This challenge aims to unify diverse distortions and sampling frequencies using a single universal SE model. #URGENT2024 (1/4)

English
0
3
31
1.9K
Samuele Cornell retweetledi
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
Hi all, This is the third call for papers about the SynData4GenAI workshop. Good news! While the submission data was originally due on June 18th, we'll extend it to June 24th. Please submit your papers at syndata4genai.org We look forward to your submissions!
Shinji Watanabe@shinjiw_at_cmu

This is the second call for papers about the SynData4GenAI workshop. Please mark your calendar for the submission due date (June 18, 2024, after the Interspeech acceptance notification)! I'm also pasting the CFP.

English
0
4
21
5.1K
Desh Raj
Desh Raj@rdesh26·
I will be in Québec next week, presenting this work at Speaker Odyssey 2024! HMU if you're around 😁 À mardi prochaine! (I'll also be chairing a session for the first time, so send me your best tips for how to make academics stick to a time limit.)
Desh Raj@rdesh26

So far, we only used SURT for transcription, without worrying about speaker labels. In Ch. 7, we show how to jointly perform transcription and streaming speaker attribution in the SURT framework. This work has been submitted to Odyssey'24: arxiv.org/abs/2401.15676 9/n

English
4
4
19
3.3K