Simon Rouard

43 posts

Simon Rouard banner
Simon Rouard

Simon Rouard

@simonrouard

PhD @kyutai_labs and @Ircam. Deep Learning and music

Paris, France Katılım Şubat 2021
210 Takip Edilen292 Takipçiler
Simon Rouard
Simon Rouard@simonrouard·
@kyutai_labs Thank you @nmboffi, as we noticed that Lagrangian self-distillation worked much better than consistency for our TTS task.
English
0
0
1
336
kyutai
kyutai@kyutai_labs·
🔓 Open source for everyone. Trained on 88k hours of public English data to ensure reproducibility. Check out the code and the technical breakdown here: kyutai.org/pocket-tts-tec…
English
9
7
109
8.7K
kyutai
kyutai@kyutai_labs·
We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇
English
90
481
3.7K
230.4K
Simon Rouard retweetledi
Gradium
Gradium@GradiumAI·
Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.
English
77
159
1.1K
423.9K
Simon Rouard retweetledi
kyutai
kyutai@kyutai_labs·
1/2 We’re releasing an in-depth tutorial on neural audio codecs, the secret sauce that makes it possible for audio LLMs to not sound like a horror movie:
English
13
55
436
47.2K
Simon Rouard retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
Rouard Simon, Orsini Manu, Roebel Axel, Zeghidour Neil, D\'efossez Alexandre, "Continuous Audio Language Models," arxiv.org/abs/2509.06926
Română
0
5
21
840
Simon Rouard retweetledi
kyutai
kyutai@kyutai_labs·
Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here: kyutai.org/next/stt
English
33
118
621
65.8K
Simon Rouard retweetledi
kyutai
kyutai@kyutai_labs·
Talk to unmute.sh 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the next few weeks.
English
121
263
1.8K
284.1K
Simon Rouard retweetledi
kyutai
kyutai@kyutai_labs·
Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters. 🧵
English
21
106
475
167.1K
Simon Rouard retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
``MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling,'' Simon Rouard, Robin San Roman, Yossi Adi, Axel Roebel, ift.tt/Ldh3kU1
English
1
4
17
947
Simon Rouard retweetledi
kyutai
kyutai@kyutai_labs·
Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! huggingface.co/kyutai/helium-…
English
10
89
379
58.2K
Simon Rouard
Simon Rouard@simonrouard·
I am presenting our paper MusicGen-Style “Audio Conditioning for Music Generation via Discrete Bottleneck Features” at @ISMIRConf this afternoon. The code as well as the weights of the model are available on github.com/facebookresear…. You can now play with it!
English
1
10
104
4.6K
Simon Rouard
Simon Rouard@simonrouard·
The code and weights of the model will be released soon. Stay tuned!
English
0
0
0
160
Simon Rouard
Simon Rouard@simonrouard·
Then we can as well use text and style conditioning to generate music, but we noticed that the model tends to ignore the text prompt. We then introduce a double classifier free guidance. This guidance could be applied to other multi-conditioned generative models.
Simon Rouard tweet media
English
1
0
1
193
Simon Rouard retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
``Audio Conditioning for Music Generation via Discrete Bottleneck Features,'' Simon Rouard, Yossi Adi, Jade Copet, Axel Roebel, Alexandre D\'efossez, ift.tt/0DRZW2F
Italiano
0
2
21
1.1K
Simon Rouard retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
``An Independence-promoting Loss for Music Generation with Language Models,'' Jean-Marie Lemercier, Simon Rouard, Jade Copet, Yossi Adi, Alexandre D\'effosez, ift.tt/eDpVoCf
Français
0
5
20
1.5K