Takuya Yoshioka

912 posts

Takuya Yoshioka banner
Takuya Yoshioka

Takuya Yoshioka

@_ty274

Speech technology researcher/manager @AssemblyAI

Bellevue, WA شامل ہوئے Kasım 2016
57 فالونگ545 فالوورز
Takuya Yoshioka ری ٹویٹ کیا
Shyam Gollakota
Shyam Gollakota@ShyamGollakota·
Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…
English
15
49
277
119.6K
Takuya Yoshioka ری ٹویٹ کیا
Jeff Dean
Jeff Dean@JeffDean·
I got an early demo of this when I visited @uwcse a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, @b_veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and @ShyamGollakota!
Shyam Gollakota@ShyamGollakota

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

English
9
33
363
103.2K
Jonathan Le Roux
Jonathan Le Roux@JonathanLeRoux·
I am honored to have been elevated to IEEE Fellow "for contributions to multi-source speech and audio processing". I've been blessed w/ fantastic collaborators over the years, from advisors & lab mates to colleagues & interns, & am greatly thankful to them all🤗 @IEEEsps @IEEEorg
English
16
13
149
8.3K
Takuya Yoshioka ری ٹویٹ کیا
Shinji Watanabe
Shinji Watanabe@shinjiw_at_cmu·
Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.
Shinji Watanabe tweet media
English
13
27
96
14.5K
Takuya Yoshioka
Takuya Yoshioka@_ty274·
Creating speech zones with self-distributing acoustic swarms Our latest paper in Nature Communications unveils distributed microphones based on an autonomous acoustic robotic swarm, creating "speech zones" in real-world settings. Paper: nature.com/articles/s4146…
English
1
15
47
7.3K
Desh Raj
Desh Raj@rdesh26·
Yesterday, the dean informed me that I have been selected as the latest recepient for the Fred Jelinek fellowship! I am extremely honored by this recognition, and I'm aware that it puts me in esteemed company. I will keep working hard to keep Jelinek's legacy alive!
English
19
5
112
7.4K
Desh Raj
Desh Raj@rdesh26·
If you work on speech/NLP, you must have come across the quote: "Every time I fire a linguist, the performance of the speech recognizer goes up." This quote is attributed to Dr. Frederick Jelinek.
English
3
6
120
19.6K
Takuya Yoshioka
Takuya Yoshioka@_ty274·
Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. マイクロソフトを退職しました。まだずっとシアトル界隈にいます。
Takuya Yoshioka tweet media
日本語
3
5
43
5.8K
Takuya Yoshioka ری ٹویٹ کیا
AK
AK@_akhaliq·
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer paper page: huggingface.co/papers/2308.06… Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks.
AK tweet media
English
4
87
311
74.1K
Takuya Yoshioka
Takuya Yoshioka@_ty274·
SpeechX from our new paper is a single generative model that edits, enhances & creates speech, enabling zero-shot TTS, spoken content editing (while preserving ambience), speaker extraction & speech/noise removal. Demo: aka.ms/speechx Paper: arxiv.org/abs/2308.06873
English
0
16
72
6.1K
Takuya Yoshioka ری ٹویٹ کیا
Jonathan Le Roux
Jonathan Le Roux@JonathanLeRoux·
To everyone booking their @IEEE_WASPAA trip: please consider attending #SANE2023, which will take place at NYU on Thursday October 26, the day after #WASPAA2023. Register at saneworkshop.org/sane2023/
IEEE WASPAA 2025@IEEE_WASPAA

Dear #WASPAA2023 authors, the review results are out now. Please go ahead and check out at cmt3.research.microsoft.com/WASPAA2023/. We appreciate your precious contribution and kind interest regardless of the acceptance decision!

English
0
7
21
3.9K
Takuya Yoshioka ری ٹویٹ کیا
Desh Raj
Desh Raj@rdesh26·
@ieeeICASSP Are there poster printing facilities at/near the conference venue?
English
1
2
0
994
Takuya Yoshioka ری ٹویٹ کیا
IEEE WASPAA 2025
IEEE WASPAA 2025@IEEE_WASPAA·
WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023
IEEE WASPAA 2025 tweet media
English
0
15
34
5.7K
Takuya Yoshioka ری ٹویٹ کیا
IEEE ICASSP
IEEE ICASSP@ieeeICASSP·
The #ICASSP2023 paper submission site is now open! Submit your papers by 19 October 2022 to be considered. Learn more about the paper guidelines and submission requirements here: hubs.la/Q01nmxt_0
English
0
5
20
0
Samuele Cornell
Samuele Cornell@SamueleCornell·
@_ty274 Thanks for the reply ! I think your far-field results are probably close to what you can obtain on headsets with a reasonably similar ASR backend (e.g. since headsets have little cross-talk maybe SOT is not needed). ESPNet-2 has ~16.5% on eval headsets github.com/espnet/espnet/…
English
1
0
1
0
Takuya Yoshioka
Takuya Yoshioka@_ty274·
How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR? t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval. Paper: arxiv.org/abs/2209.04974
Takuya Yoshioka tweet media
English
2
4
27
0
Takuya Yoshioka
Takuya Yoshioka@_ty274·
@SamueleCornell Good question! We focused on the distant mic setup and didn't do headset experiments in such a way that the distant-mic vs. headset numbers can be directly compared. Let us consider how to do the experiment and report the additional result.
English
1
0
1
0
Samuele Cornell
Samuele Cornell@SamueleCornell·
@_ty274 Very cool work ! The results are impressive considering the system is streamable. Out of curiosity, may I ask if you know what is the WER as obtained over the headset signals with your best back-end model ?
English
1
0
0
0