WAVLab | @CarnegieMellon

310 posts

WAVLab | @CarnegieMellon

@WavLab

Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more.

شامل ہوئے Ağustos 2021

146 فالونگ2.4K فالوورز

WAVLab | @CarnegieMellon@WavLab·2d

Congratulations to Li-Wei @liweiche77 on successfully defending his PhD today! 🎉 Wishing him all the best in his next chapter!

English

1.2K

WAVLab | @CarnegieMellon@WavLab·9 Mar

Congratulations to Siddhant @Sid_Arora_18 on a successful PhD defense today! It was wonderful to celebrate this big milestone together. Wishing him all the best for the exciting journey ahead.

English

2.9K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

LLM Papers@HEI·21 Oca

PRiSM: Benchmarking Phone Realization in Speech Models Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, … arxiv.org/abs/2601.14046 [𝚌𝚜.𝙲𝙻 𝚌𝚜.𝚂𝙳]

Indonesia

399

WAVLab | @CarnegieMellon ری ٹویٹ کیا

arXiv Sound@ArxivSound·21 Oca

Chenda Li, Wei Wang, Marvin Sach, Wangyou Zhang, Kohei Saijo, Samuele Cornell, Yihui Fu, Zhaoheng Ni, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian, "ICASSP 2026 URGENT Speech Enhancement Challenge," arxiv.org/abs/2601.13531

Deutsch

803

WAVLab | @CarnegieMellon ری ٹویٹ کیا

arXiv Sound@ArxivSound·21 Oca

Pu Wang, Shinji Watanabe, Hugo Van hamme, "SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition," arxiv.org/abs/2601.12600

English

379

WAVLab | @CarnegieMellon ری ٹویٹ کیا

arXiv Sound@ArxivSound·21 Oca

Shih-Heng Wang, Jiatong Shi, Jinchuan Tian, Haibin Wu, Shinji Watanabe, "Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks," arxiv.org/abs/2601.12205

English

820

WAVLab | @CarnegieMellon ری ٹویٹ کیا

jiatongshi@jiatongshi·30 Kas

Heading to NeurIPS 2025 in San Diego! I’ll present our spotlight poster, ARECHO, focusing on speech multi-metric estimation. 📍 Exhibit Hall C,D,E #2000 🗓️ Thu Dec 4, 11 a.m.–2 p.m. PST If you’re around, let’s say hi or grab a coffee!

English

1.4K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

jiatongshi@jiatongshi·19 Kas

This is exactly the reason we worked for ESPnet-Codec, but being really hard to keep tracking as people are fast nowadays. The similar issue happens at most speech tasks from ASR, TTS, to general speech LLM. It's a bit sad time for driving scientific findings 🥲

🐿️🐒🗻📚🐹🦈@SythonUK

ﾇﾗｰﾙｵﾁﾞｵｰｺﾃﾞｸの論文、全く違うデータで学習されたモデルを比較して「ワイらのモデル最強や！！😤😤😤」と主張しているものばかりで😩😩😩😩😩😩😩😩😩😩😩に関するMOS値が1000000になった

English

4.5K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

jiatongshi@jiatongshi·10 Kas

Speech isn’t just sound -> it’s how we turn thought into expression. Our new work, Speech-DRAME, measures how well speech AI can act, aligning evaluation with human perception. Paper: arxiv.org/abs/2511.01261 Code: github.com/Anuttacon/spee…

English

WAVLab | @CarnegieMellon ری ٹویٹ کیا

jiatongshi@jiatongshi·30 Eyl

🚀 I’m open to new opportunities in industry! Ph.D. candidate @CMU (advisor: @shinjiw_at_cmu ). Research: speech/audio AI, speech LLMs, evaluation frameworks. Ex-Meta AI, Tencent, IBM Research. DMs open — let’s connect!

English

5.7K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Chris Donahue@chrisdonahuey·22 Eyl

Sharing our initial leaderboard and open data release for 🎶Music Arena⚔️! Music is subjective and multi-dimensional. A key goal of Music Arena is to provide insights beyond binary preferences! 🧵

English

15.8K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

jiatongshi@jiatongshi·18 Eyl

ARECHO has been accepted by #neurips25 as spotlight! Many thanks to all the co-authors for their great effort and support!

jiatongshi@jiatongshi

🔊 New release: #ARECHO -> Autoregressive Evaluation via Chain-based Hypothesis Optimization. • 87-metric coverage in one model 🧮 • Dynamic classifier chain 🤝 • Unified tokenization 🧩 • Confidence-aware decoding 🛡️ Built on #UniVERSA, heading to #VERSA. More ↓

English

4.1K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Shinji Watanabe@shinjiw_at_cmu·12 Eyl

espnet v.202509 released 🚀 github.com/espnet/espnet/… Includes many updates + fixes for NumPy 2.0 & Python 3.12 (thanks Nelson!). This is the last major update before we shift to the next-gen framework, ESPnet3 Interested in collaborating? Let us know!

English

3.6K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Shinji Watanabe@shinjiw_at_cmu·8 Eyl

Thanks, @HungyiLee2, for visiting CMU! Great discussions, inspiring research exchanges, and exciting seeds for collaboration ahead.

English

3.1K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Pooneh Mousavi@MousaviPooneh·30 Ağu

I’m happy to share that our paper, "Discrete Audio Tokens: More Than a Survey!", has been accepted at TMLR. 🎉 📄 Read: arxiv.org/pdf/2506.10274 🔎 Explore our tokenizer database & submit yours: poonehmousavi.github.io/dates-website/…

Gallil Maimon@GallilMaimon

🎉🥳 I am thrilled to share that our work on audio tokenisers has been accepted to #TMLR The tokeniser DB is ever updating so submit your new tokenisers 💪 poonehmousavi.github.io/dates-website/

English

1.8K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Shinji Watanabe@shinjiw_at_cmu·29 Ağu

For my CMU course, we built the OWSM v4 demo😀 Check it out here: huggingface.co/spaces/espnet/…

Shinji Watanabe@shinjiw_at_cmu

Our work on OWSM v4 received the Best Student Paper Award at #Interspeech2025! 🏆🎉 Huge congratulations to the team! 🚀👏 I’m especially happy to see our open science efforts for speech foundation models recognized by the community. 🙌 🔗 isca-archive.org/interspeech_20…

English

2.6K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Masao@mmiagshatoy·28 Ağu

OWSM-V4 is now available as a demo! It includes both the OWSM-V4 Medium model and OWSM-V4 CTC model, each with about 1B parameters. 👉 Try it out here: huggingface.co/spaces/espnet/…

English

2.6K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Shinji Watanabe@shinjiw_at_cmu·22 Ağu

English

116

13.8K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Siddhant Arora@Sid_Arora_18·17 Ağu

Excited to be presenting 3 papers at #Interspeech2025 in Rotterdam this week! 1. Chain-of-Thought Reasoning for E2E Spoken Dialogue Systems Adds reasoning to real-time dialogue models, with open-source toolkit. Oral — Thu, 10:10 | Dock 10B arxiv.org/abs/2506.00722

English

2.9K

WAVLab | @CarnegieMellon ری ٹویٹ کیا

Shikhar@ShikharSSU·16 Ağu

Meet Masao next week in #Interspeech2025 We use sound event, language and speaker context to prune large speech models for ASR and Speech Translation.

Masao@mmiagshatoy

🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically　adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860

English

1.4K

دریافت کریں

@liweiche77 @Sid_Arora_18 @CMU @shinjiw_at_cmu @HungyiLee2 @elonmusk @BarackObama @taylorswift13