Simon Rouard

43 posts

Simon Rouard

@simonrouard

PhD @kyutai_labs and @Ircam. Deep Learning and music

Paris, France Katılım Şubat 2021

210 Takip Edilen292 Takipçiler

Sabitlenmiş Tweet

Simon Rouard@simonrouard·19 Tem

Very happy to announce that my paper “Audio Conditioning for Music Generation via Discrete Bottleneck Features“ done with @honualx @adiyossLC @jadecopet and Axel Roebel has been accepted at ISMIR24. Paper: arxiv.org/abs/2407.12563 Sample: musicgenstyle.github.io Code: soon

English

5.6K

Simon Rouard@simonrouard·13 Oca

@kyutai_labs Thank you @nmboffi, as we noticed that Lagrangian self-distillation worked much better than consistency for our TTS task.

English

336

kyutai@kyutai_labs·13 Oca

🔓 Open source for everyone. Trained on 88k hours of public English data to ensure reproducibility. Check out the code and the technical breakdown here: kyutai.org/pocket-tts-tec…

English

109

8.7K

kyutai@kyutai_labs·13 Oca

We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇

English

481

3.7K

230.4K

Simon Rouard@simonrouard·13 Oca

Super happy that our work on Continuous Audio Language Models (arxiv.org/abs/2509.06926) led us to build an outstanding 100M TTS with voice cloning ability that runs on any laptop CPU.

kyutai@kyutai_labs

English

1.6K

Simon Rouard retweetledi

Gradium@GradiumAI·2 Ara

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English

159

1.1K

423.9K

Simon Rouard retweetledi

kyutai@kyutai_labs·21 Eki

1/2 We’re releasing an in-depth tutorial on neural audio codecs, the secret sauce that makes it possible for audio LLMs to not sound like a horror movie:

English

436

47.2K

Simon Rouard retweetledi

arXiv Sound@ArxivSound·9 Eyl

Rouard Simon, Orsini Manu, Roebel Axel, Zeghidour Neil, D\'efossez Alexandre, "Continuous Audio Language Models," arxiv.org/abs/2509.06926

Română

840

Simon Rouard retweetledi

kyutai@kyutai_labs·19 Haz

Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here: kyutai.org/next/stt

English

118

621

65.8K

Simon Rouard retweetledi

kyutai@kyutai_labs·23 May

Talk to unmute.sh 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the next few weeks.

English

121

263

1.8K

284.1K

Simon Rouard retweetledi

kyutai@kyutai_labs·6 Şub

Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters. 🧵

English

106

475

167.1K

Simon Rouard retweetledi

arXiv Sound@ArxivSound·6 Oca

``MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling,'' Simon Rouard, Robin San Roman, Yossi Adi, Axel Roebel, ift.tt/Ldh3kU1

English

947

Simon Rouard retweetledi

kyutai@kyutai_labs·13 Oca

Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! huggingface.co/kyutai/helium-…

English

379

58.2K

Simon Rouard@simonrouard·11 Kas

I am presenting our paper MusicGen-Style “Audio Conditioning for Music Generation via Discrete Bottleneck Features” at @ISMIRConf this afternoon. The code as well as the weights of the model are available on github.com/facebookresear…. You can now play with it!

English

104

4.6K

Simon Rouard retweetledi

Nicolas DUFOUR@nico_dufour·2 Eki

It start now at poster 227!

Nicolas DUFOUR@nico_dufour

We are in Milan 🇮🇹 to present 🎥 E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness. 📍 Come see our poster #227 this afternoon at #ECCV2024! 🚀 Introducing new dataset, diffusion model, and evaluation metric for camera generation!

English

2.9K

Simon Rouard@simonrouard·19 Tem

The code and weights of the model will be released soon. Stay tuned!

English

160

Simon Rouard@simonrouard·19 Tem

Then we can as well use text and style conditioning to generate music, but we noticed that the model tends to ignore the text prompt. We then introduce a double classifier free guidance. This guidance could be applied to other multi-conditioned generative models.

English

193

Simon Rouard@simonrouard·19 Tem

English

5.6K

Simon Rouard retweetledi

arXiv Sound@ArxivSound·18 Tem

``Audio Conditioning for Music Generation via Discrete Bottleneck Features,'' Simon Rouard, Yossi Adi, Jade Copet, Axel Roebel, Alexandre D\'efossez, ift.tt/0DRZW2F

Italiano

1.1K

Simon Rouard retweetledi

Jean-Marie Lemercier@jm_lemercier·7 Haz

#ICML2024 paper “An Independence-promoting Loss for Music Generation with Language Models” We promote independence between EnCodec codebooks using a kernel trick and improve music generation quality 🎶 Paper 📜 arxiv.org/pdf/2406.02315 Audio/Code 🔊 jmlemercier.github.io/encodec-mmd.gi…

English

7.5K

Simon Rouard retweetledi

arXiv Sound@ArxivSound·5 Haz

``An Independence-promoting Loss for Music Generation with Language Models,'' Jean-Marie Lemercier, Simon Rouard, Jade Copet, Yossi Adi, Alexandre D\'effosez, ift.tt/eDpVoCf

Français

1.5K

Keşfet

@kyutai_labs @nmboffi @ISMIRConf @honualx @adiyossLC @jadecopet @elonmusk @BarackObama