Nick Lee retweetledi

🎙️Introducing StyleStream:
the first streamable zero-shot voice style conversion system.
🚀Clone timbre, accent, and emotion simultaneously, with state-of-the-art quality, streaming locally on a single RTX 4060.
💡The Destylizer trained with ASR supervision + an ultra-narrow information bottleneck strips away voice style while preserving only linguistic content. Crucially, it keeps the original duration, making clean chunk-by-chunk streaming straightforward.
🎨The diffusion Stylizer then faithfully re-renders the content with the target style.
📰Paper: arxiv.org/abs/2602.20113…
💻Code: github.com/Berkeley-Speec…
🔊Demo: berkeley-speech-group.github.io/StyleStream/
▶️Real-time demo below ⬇️
English









