WAVLab | @CarnegieMellon

320 posts

WAVLab | @CarnegieMellon

@WavLab

Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more.

Katılım Ağustos 2021

147 Takip Edilen2.4K Takipçiler

WAVLab | @CarnegieMellon retweetledi

Shinji Watanabe@shinjiw_at_cmu·3d

We are looking for a postdoctoral researcher in speech and audio processing, with a possible start in the Fall 2026 semester. If you are interested in working with us, please apply through the following form: forms.gle/gfENMMrRf1nmnT…

English

7.9K

WAVLab | @CarnegieMellon retweetledi

William Chen@chenwanch1·2 May

Accepted to ICML! See y’all in Korea 🇰🇷

William Chen@chenwanch1

What if you had nano-banana for audio? AudioChat is a multi-modal LM that performs fine-grained understanding, generation, and editing of multi-source scenes By diffusing continuous latents, it generates 48khz stereo edits with great input adherence: wanchichen.github.io/audiochat/

English

3.2K

WAVLab | @CarnegieMellon@WavLab·3 May

7. Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning With Differentiable K-Means Poster: May 6, 14:00 arxiv.org/abs/2601.19781 8. Online Register for Dual-Mode Self-Supervised Speech Models Poster: May 7, 09:00 arxiv.org/abs/2602.23702 5/5

English

162

WAVLab | @CarnegieMellon@WavLab·3 May

WAVLab @ #ICASSP2026 We will present 8 papers at ICASSP in Barcelona. If you are attending, please stop by the talks/posters and chat with the authors. arXiv links and presentation info below. 1/5

English

1.7K

WAVLab | @CarnegieMellon@WavLab·3 May

5. Full-Duplex-Bench V1.5: Evaluating Overlap Handling for Full-Duplex Speech Models Poster: May 8, 14:00 arxiv.org/abs/2507.23159 6. CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR Oral: May 8, 15:00 arxiv.org/abs/2601.22792 4/5

English

121

WAVLab | @CarnegieMellon@WavLab·3 May

3. Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition Oral: May 7, 15:00 arxiv.org/abs/2509.24187 4. 2025 URGENT Speech Enhancement Challenge Multilingual P.808 Listening Tests Oral: May 6, 17:50 arxiv.org/abs/2507.11306 3/5

English

138

WAVLab | @CarnegieMellon@WavLab·3 May

1. ICASSP 2026 URGENT Speech Enhancement Challenge Poster: Fri May 8, 14:00 to 16:00, Poster Area 43 arxiv.org/abs/2601.13531 2. SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition Oral: Fri May 8, 10:00 to 10:20 arxiv.org/abs/2601.12600 2/5

English

279

WAVLab | @CarnegieMellon@WavLab·23 Nis

Congrats to Brian @brianyan918 on finishing his PhD defense today! It was great to see so many people show up for this big event and celebrate such an important milestone. Wishing you all the best in what comes next!

English

921

WAVLab | @CarnegieMellon retweetledi

Shinji Watanabe@shinjiw_at_cmu·7 Nis

6 papers (4 main and 2 findings) were accepted at #ACL2026! All are speech papers :)

English

4.8K

WAVLab | @CarnegieMellon retweetledi

arXiv Sound@ArxivSound·2 Nis

Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi, Eunjung Yeo, William Chen, Shinji Watanabe, David R. Mortensen, "An Empirical Recipe for Universal Phone Recognition," arxiv.org/abs/2603.29042

English

WAVLab | @CarnegieMellon@WavLab·25 Mar

Congratulations to Li-Wei @liweiche77 on successfully defending his PhD today! 🎉 Wishing him all the best in his next chapter!

English

1.4K

WAVLab | @CarnegieMellon@WavLab·9 Mar

Congratulations to Siddhant @Sid_Arora_18 on a successful PhD defense today! It was wonderful to celebrate this big milestone together. Wishing him all the best for the exciting journey ahead.

English

3.7K

WAVLab | @CarnegieMellon retweetledi

Natural Language Processing Papers@HEI·21 Oca

PRiSM: Benchmarking Phone Realization in Speech Models Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, … arxiv.org/abs/2601.14046 [𝚌𝚜.𝙲𝙻 𝚌𝚜.𝚂𝙳]

Natural Language Processing Papers tweet media

Indonesia

449

WAVLab | @CarnegieMellon retweetledi

arXiv Sound@ArxivSound·21 Oca

Chenda Li, Wei Wang, Marvin Sach, Wangyou Zhang, Kohei Saijo, Samuele Cornell, Yihui Fu, Zhaoheng Ni, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian, "ICASSP 2026 URGENT Speech Enhancement Challenge," arxiv.org/abs/2601.13531

Deutsch

836

WAVLab | @CarnegieMellon retweetledi

arXiv Sound@ArxivSound·21 Oca

Pu Wang, Shinji Watanabe, Hugo Van hamme, "SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition," arxiv.org/abs/2601.12600

English

402

WAVLab | @CarnegieMellon retweetledi

arXiv Sound@ArxivSound·21 Oca

Shih-Heng Wang, Jiatong Shi, Jinchuan Tian, Haibin Wu, Shinji Watanabe, "Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks," arxiv.org/abs/2601.12205

English

846

WAVLab | @CarnegieMellon retweetledi

jiatongshi@jiatongshi·30 Kas

Heading to NeurIPS 2025 in San Diego! I’ll present our spotlight poster, ARECHO, focusing on speech multi-metric estimation. 📍 Exhibit Hall C,D,E #2000 🗓️ Thu Dec 4, 11 a.m.–2 p.m. PST If you’re around, let’s say hi or grab a coffee!

English

1.4K

WAVLab | @CarnegieMellon retweetledi

jiatongshi@jiatongshi·19 Kas

This is exactly the reason we worked for ESPnet-Codec, but being really hard to keep tracking as people are fast nowadays. The similar issue happens at most speech tasks from ASR, TTS, to general speech LLM. It's a bit sad time for driving scientific findings 🥲

🐿️🐒🗻📚🐹🦈@SythonUK

ﾇﾗｰﾙｵﾁﾞｵｰｺﾃﾞｸの論文、全く違うデータで学習されたモデルを比較して「ワイらのモデル最強や！！😤😤😤」と主張しているものばかりで😩😩😩😩😩😩😩😩😩😩😩に関するMOS値が1000000になった

English

4.5K

WAVLab | @CarnegieMellon retweetledi

jiatongshi@jiatongshi·10 Kas

Speech isn’t just sound -> it’s how we turn thought into expression. Our new work, Speech-DRAME, measures how well speech AI can act, aligning evaluation with human perception. Paper: arxiv.org/abs/2511.01261 Code: github.com/Anuttacon/spee…

English

WAVLab | @CarnegieMellon retweetledi

jiatongshi@jiatongshi·30 Eyl

🚀 I’m open to new opportunities in industry! Ph.D. candidate @CMU (advisor: @shinjiw_at_cmu ). Research: speech/audio AI, speech LLMs, evaluation frameworks. Ex-Meta AI, Tencent, IBM Research. DMs open — let’s connect!

English

5.8K

Keşfet

@brianyan918 @liweiche77 @Sid_Arora_18 @CMU @shinjiw_at_cmu @elonmusk @BarackObama @taylorswift13