WAVLab | @CarnegieMellon

310 posts

WAVLab | @CarnegieMellon banner
WAVLab | @CarnegieMellon

WAVLab | @CarnegieMellon

@WavLab

Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more.

参加日 Ağustos 2021
146 フォロー中2.4K フォロワー
WAVLab | @CarnegieMellon
Congratulations to Li-Wei @liweiche77 on successfully defending his PhD today! 🎉 Wishing him all the best in his next chapter!
WAVLab | @CarnegieMellon tweet media
English
0
4
20
1.2K
WAVLab | @CarnegieMellon
Congratulations to Siddhant @Sid_Arora_18 on a successful PhD defense today! It was wonderful to celebrate this big milestone together. Wishing him all the best for the exciting journey ahead.
WAVLab | @CarnegieMellon tweet mediaWAVLab | @CarnegieMellon tweet media
English
4
5
48
2.9K
WAVLab | @CarnegieMellon がリツイート
LLM Papers
LLM Papers@HEI·
PRiSM: Benchmarking Phone Realization in Speech Models Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, … arxiv.org/abs/2601.14046 [𝚌𝚜.𝙲𝙻 𝚌𝚜.𝚂𝙳]
LLM Papers tweet media
Indonesia
0
4
5
399
WAVLab | @CarnegieMellon がリツイート
arXiv Sound
arXiv Sound@ArxivSound·
Chenda Li, Wei Wang, Marvin Sach, Wangyou Zhang, Kohei Saijo, Samuele Cornell, Yihui Fu, Zhaoheng Ni, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian, "ICASSP 2026 URGENT Speech Enhancement Challenge," arxiv.org/abs/2601.13531
Deutsch
0
3
11
803
WAVLab | @CarnegieMellon がリツイート
arXiv Sound
arXiv Sound@ArxivSound·
Pu Wang, Shinji Watanabe, Hugo Van hamme, "SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition," arxiv.org/abs/2601.12600
English
0
2
4
379
WAVLab | @CarnegieMellon がリツイート
arXiv Sound
arXiv Sound@ArxivSound·
Shih-Heng Wang, Jiatong Shi, Jinchuan Tian, Haibin Wu, Shinji Watanabe, "Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks," arxiv.org/abs/2601.12205
English
0
3
15
820
WAVLab | @CarnegieMellon がリツイート
jiatongshi
jiatongshi@jiatongshi·
Heading to NeurIPS 2025 in San Diego! I’ll present our spotlight poster, ARECHO, focusing on speech multi-metric estimation. 📍 Exhibit Hall C,D,E #2000 🗓️ Thu Dec 4, 11 a.m.–2 p.m. PST If you’re around, let’s say hi or grab a coffee!
English
1
3
23
1.4K
WAVLab | @CarnegieMellon がリツイート
jiatongshi
jiatongshi@jiatongshi·
This is exactly the reason we worked for ESPnet-Codec, but being really hard to keep tracking as people are fast nowadays. The similar issue happens at most speech tasks from ASR, TTS, to general speech LLM. It's a bit sad time for driving scientific findings 🥲
🐿️🐒🗻📚🐹🦈@SythonUK

ヌラールオヂオーコデクの論文、全く違うデータで学習されたモデルを比較して「ワイらのモデル最強や!!😤😤😤」と主張しているものばかりで😩😩😩😩😩😩😩😩😩😩😩に関するMOS値が1000000になった

English
4
4
28
4.5K
WAVLab | @CarnegieMellon がリツイート
jiatongshi
jiatongshi@jiatongshi·
Speech isn’t just sound -> it’s how we turn thought into expression. Our new work, Speech-DRAME, measures how well speech AI can act, aligning evaluation with human perception. Paper: arxiv.org/abs/2511.01261 Code: github.com/Anuttacon/spee…
English
1
5
24
4K
WAVLab | @CarnegieMellon がリツイート
jiatongshi
jiatongshi@jiatongshi·
🚀 I’m open to new opportunities in industry! Ph.D. candidate @CMU (advisor: @shinjiw_at_cmu ). Research: speech/audio AI, speech LLMs, evaluation frameworks. Ex-Meta AI, Tencent, IBM Research. DMs open — let’s connect!
English
1
16
55
5.7K
WAVLab | @CarnegieMellon がリツイート
Chris Donahue
Chris Donahue@chrisdonahuey·
Sharing our initial leaderboard and open data release for 🎶Music Arena⚔️! Music is subjective and multi-dimensional. A key goal of Music Arena is to provide insights beyond binary preferences! 🧵
Chris Donahue tweet media
English
2
20
86
15.8K
WAVLab | @CarnegieMellon がリツイート
jiatongshi
jiatongshi@jiatongshi·
ARECHO has been accepted by #neurips25 as spotlight! Many thanks to all the co-authors for their great effort and support!
jiatongshi@jiatongshi

🔊 New release: #ARECHO -> Autoregressive Evaluation via Chain-based Hypothesis Optimization. • 87-metric coverage in one model 🧮 • Dynamic classifier chain 🤝 • Unified tokenization 🧩 • Confidence-aware decoding 🛡️ Built on #UniVERSA, heading to #VERSA. More ↓

English
1
7
29
4.1K
WAVLab | @CarnegieMellon がリツイート
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
espnet v.202509 released 🚀 github.com/espnet/espnet/… Includes many updates + fixes for NumPy 2.0 & Python 3.12 (thanks Nelson!). This is the last major update before we shift to the next-gen framework, ESPnet3 Interested in collaborating? Let us know!
English
0
13
36
3.6K
WAVLab | @CarnegieMellon がリツイート
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
Thanks, @HungyiLee2, for visiting CMU! Great discussions, inspiring research exchanges, and exciting seeds for collaboration ahead.
Shinji Watanabe tweet mediaShinji Watanabe tweet mediaShinji Watanabe tweet mediaShinji Watanabe tweet media
English
0
4
54
3.1K
WAVLab | @CarnegieMellon がリツイート
Pooneh Mousavi
Pooneh Mousavi@MousaviPooneh·
I’m happy to share that our paper, "Discrete Audio Tokens: More Than a Survey!", has been accepted at TMLR. 🎉 📄 Read: arxiv.org/pdf/2506.10274 🔎 Explore our tokenizer database & submit yours: poonehmousavi.github.io/dates-website/…
Gallil Maimon@GallilMaimon

🎉🥳 I am thrilled to share that our work on audio tokenisers has been accepted to #TMLR The tokeniser DB is ever updating so submit your new tokenisers 💪 poonehmousavi.github.io/dates-website/

English
0
4
20
1.8K
WAVLab | @CarnegieMellon がリツイート
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
For my CMU course, we built the OWSM v4 demo😀 Check it out here: huggingface.co/spaces/espnet/…
Shinji Watanabe@shinjiw_at_cmu

Our work on OWSM v4 received the Best Student Paper Award at #Interspeech2025! 🏆🎉 Huge congratulations to the team! 🚀👏 I’m especially happy to see our open science efforts for speech foundation models recognized by the community. 🙌 🔗 isca-archive.org/interspeech_20…

English
0
3
26
2.6K
WAVLab | @CarnegieMellon がリツイート
Masao
Masao@mmiagshatoy·
OWSM-V4 is now available as a demo! It includes both the OWSM-V4 Medium model and OWSM-V4 CTC model, each with about 1B parameters. 👉 Try it out here: huggingface.co/spaces/espnet/…
English
0
9
20
2.6K
WAVLab | @CarnegieMellon がリツイート
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
Our work on OWSM v4 received the Best Student Paper Award at #Interspeech2025! 🏆🎉 Huge congratulations to the team! 🚀👏 I’m especially happy to see our open science efforts for speech foundation models recognized by the community. 🙌 🔗 isca-archive.org/interspeech_20…
Shinji Watanabe tweet media
English
9
22
116
13.8K
WAVLab | @CarnegieMellon がリツイート
Siddhant Arora
Siddhant Arora@Sid_Arora_18·
Excited to be presenting 3 papers at #Interspeech2025 in Rotterdam this week! 1. Chain-of-Thought Reasoning for E2E Spoken Dialogue Systems Adds reasoning to real-time dialogue models, with open-source toolkit. Oral — Thu, 10:10 | Dock 10B arxiv.org/abs/2506.00722
English
3
9
31
2.9K
WAVLab | @CarnegieMellon がリツイート
Shikhar
Shikhar@ShikharSSU·
Meet Masao next week in #Interspeech2025 We use sound event, language and speaker context to prune large speech models for ASR and Speech Translation.
Masao@mmiagshatoy

🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860

English
0
4
8
1.4K