Takuya Yoshioka

912 posts

Takuya Yoshioka

@_ty274

Speech technology researcher/manager @AssemblyAI

Bellevue, WA شامل ہوئے Kasım 2016

57 فالونگ545 فالوورز

Takuya Yoshioka ری ٹویٹ کیا

Shyam Gollakota@ShyamGollakota·26 May

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

English

277

119.6K

Takuya Yoshioka ری ٹویٹ کیا

Jeff Dean@JeffDean·26 May

I got an early demo of this when I visited @uwcse a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, @b_veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and @ShyamGollakota!

Shyam Gollakota@ShyamGollakota

English

363

103.2K

Takuya Yoshioka@_ty274·23 Kas

@JonathanLeRoux @IEEEsps @IEEEorg Congrats!

English

231

Jonathan Le Roux@JonathanLeRoux·23 Kas

I am honored to have been elevated to IEEE Fellow "for contributions to multi-source speech and audio processing". I've been blessed w/ fantastic collaborators over the years, from advisors & lab mates to colleagues & interns, & am greatly thankful to them all🤗 @IEEEsps @IEEEorg

English

149

8.3K

Takuya Yoshioka ری ٹویٹ کیا

Shinji Watanabe@shinjiw_at_cmu·20 Eki

Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.

English

14.5K

Takuya Yoshioka@_ty274·22 Eyl

The code and project page are here. Code: github.com/uw-x/AcousticS… Project page: acousticswarm.cs.washington.edu

English

435

Takuya Yoshioka@_ty274·22 Eyl

Creating speech zones with self-distributing acoustic swarms Our latest paper in Nature Communications unveils distributed microphones based on an autonomous acoustic robotic swarm, creating "speech zones" in real-world settings. Paper: nature.com/articles/s4146…

English

7.3K

Takuya Yoshioka@_ty274·16 Eyl

@rdesh26 Congrats!

English

237

Desh Raj@rdesh26·15 Eyl

Yesterday, the dean informed me that I have been selected as the latest recepient for the Fred Jelinek fellowship! I am extremely honored by this recognition, and I'm aware that it puts me in esteemed company. I will keep working hard to keep Jelinek's legacy alive!

English

112

7.4K

Desh Raj@rdesh26·15 Eyl

If you work on speech/NLP, you must have come across the quote: "Every time I fire a linguist, the performance of the speech recognizer goes up." This quote is attributed to Dr. Frederick Jelinek.

English

120

19.6K

Takuya Yoshioka@_ty274·4 Eyl

Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. マイクロソフトを退職しました。まだずっとシアトル界隈にいます。

日本語

5.8K

Takuya Yoshioka ری ٹویٹ کیا

AK@_akhaliq·15 Ağu

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer paper page: huggingface.co/papers/2308.06… Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks.

English

311

74.1K

Takuya Yoshioka@_ty274·15 Ağu

SpeechX from our new paper is a single generative model that edits, enhances & creates speech, enabling zero-shot TTS, spoken content editing (while preserving ambience), speaker extraction & speech/noise removal. Demo: aka.ms/speechx Paper: arxiv.org/abs/2308.06873

English

6.1K

Takuya Yoshioka ری ٹویٹ کیا

Jonathan Le Roux@JonathanLeRoux·13 Tem

To everyone booking their @IEEE_WASPAA trip: please consider attending #SANE2023, which will take place at NYU on Thursday October 26, the day after #WASPAA2023. Register at saneworkshop.org/sane2023/

IEEE WASPAA 2025@IEEE_WASPAA

Dear #WASPAA2023 authors, the review results are out now. Please go ahead and check out at cmt3.research.microsoft.com/WASPAA2023/. We appreciate your precious contribution and kind interest regardless of the acceptance decision!

English

3.9K

Takuya Yoshioka ری ٹویٹ کیا

Desh Raj@rdesh26·2 May

@ieeeICASSP Are there poster printing facilities at/near the conference venue?

English

994

Takuya Yoshioka@_ty274·23 Nis

Real-time target sound extraction with waveformer (to appear in ICASSP). Joint work with UW researchers. Paper (updated): arxiv.org/abs/2211.02250 Demo: waveformer.cs.washington.edu Code (both causal and non-causal): github.com/vb000/Waveform…

English

143

36.1K

Takuya Yoshioka ری ٹویٹ کیا

IEEE WASPAA 2025@IEEE_WASPAA·7 Şub

WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023

English

5.7K

Takuya Yoshioka ری ٹویٹ کیا

Shinji Watanabe@shinjiw_at_cmu·20 Oca

すごい！世界最大1万9千時間の音声コーパスと高精度日本語音声認識モデルがオープンソースで公開 - 窓の杜 forest.watch.impress.co.jp/docs/news/1471… via @madonomori

日本語

2.3K

Takuya Yoshioka@_ty274·23 Kas

@shinjiw_at_cmu Congratulations, Watanabe-san!

Filipino

Shinji Watanabe@shinjiw_at_cmu·23 Kas

I got the following notification yesterday about the elevation to IEEE fellow. I really would like to thank all of my colleagues. I could not achieve this great outcome without their help. みなさま、本当にありがとうございました。

Language Technologies Institute | @CarnegieMellon@LTIatCMU

Congratulations to Shinji Watanabe for his selection as a 2023 IEEE Fellow! sites.google.com/view/shinjiwat…

日本語

149

Takuya Yoshioka ری ٹویٹ کیا

IEEE ICASSP@ieeeICASSP·28 Eyl

The #ICASSP2023 paper submission site is now open! Submit your papers by 19 October 2022 to be considered. Learn more about the paper guidelines and submission requirements here: hubs.la/Q01nmxt_0

English

Takuya Yoshioka@_ty274·17 Eyl

@SamueleCornell Yep, conventional ASR models should be good for the headset recordings.

English

Samuele Cornell@SamueleCornell·17 Eyl

@_ty274 Thanks for the reply ! I think your far-field results are probably close to what you can obtain on headsets with a reasonably similar ASR backend (e.g. since headsets have little cross-talk maybe SOT is not needed). ESPNet-2 has ~16.5% on eval headsets github.com/espnet/espnet/…

English

Takuya Yoshioka@_ty274·16 Eyl

How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR? t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval. Paper: arxiv.org/abs/2209.04974

English

Takuya Yoshioka@_ty274·17 Eyl

@SamueleCornell Good question! We focused on the distant mic setup and didn't do headset experiments in such a way that the distant-mic vs. headset numbers can be directly compared. Let us consider how to do the experiment and report the additional result.

English

Samuele Cornell@SamueleCornell·16 Eyl

@_ty274 Very cool work ! The results are impressive considering the system is streamable. Out of curiosity, may I ask if you know what is the WER as obtained over the headset signals with your best back-end model ?

English

دریافت کریں

@uwcse @b_veluri @ShyamGollakota @JonathanLeRoux @IEEEsps @IEEEorg @rdesh26 @IEEE_WASPAA