Tom Labiausse

24 posts

Tom Labiausse banner
Tom Labiausse

Tom Labiausse

@tom_labiausse

ML Research @kyutai_labs

Paris Katılım Temmuz 2024
3 Takip Edilen200 Takipçiler
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
Three of our papers got accepted at ICML and one at CVPR this year 🎉 We will have researchers on-site for both conferences, so come talk to us if you want to learn more about Kyutai! 👁️ MoshiVis (CVPR’26) → Vision Speech Models: A data- and training- efficient pipeline for omni models built on top of Moshi 🧠 MoshiRAG (ICML’26) → Making speech-to-speech models smarter with the power of RAG and minimal latency 🗣️Hibiki-Zero (ICML’26) → Streaming speech-to-speech translation without aligned data leveraging GRPO ⌛ Kairos (ICML’26) → Recency bias is real, even for LLMs. More details in a future post! #ICML2026 #CVPR2026
English
3
13
152
9.1K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵
English
17
65
482
51.6K
Tom Labiausse retweetledi
Neil Zeghidour
Neil Zeghidour@neilzegh·
Gradium builds models, not orchestration or voice agents. But to really evaluate the conversational experience around our models, you need to see them inside an actual agent. That’s why we built Gradbot internally, to spin up a POC in minutes before a sales call. Now we’re open-sourcing it for anyone to experiment with and have fun.
Gradium@GradiumAI

Today we're open-sourcing Gradbot. The framework we built internally at Gradium to prototype voice agents. Idea to a working prototype in around 50 lines of code

English
1
5
26
4.8K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
We're releasing OVIE, a novel view generation model trained entirely on single images. No multi-view datasets needed. Given a single image, it generates novel views of any scene in real time, running orders of magnitude faster than competing approaches.
English
3
27
210
30.7K
Tom Labiausse retweetledi
Matt Turck
Matt Turck@mattturck·
Voice used to be AI’s forgotten modality - now it's having its big moment: rapid innovation, big funding rounds, major agentic applications My conversation with @neilzegh, top AI researcher in the field (@GoogleDeepMind, @Meta, @kyutai_labs) and now CEO of @GradiumAI This is a reference episode on all things voice AI 🔥 00:00 Intro 01:21 Voice AI’s big moment, and why we’re still early 03:34 Why voice lagged behind text/image/video 06:06 The convergence era: transformers for every modality 07:40 Beyond Her: always-on assistants, wake words, voice-first devices 11:01 Voice vs text: where voice fits (even for coding) 12:56 Neil’s origin story: from finance to machine learning, with help from @ylecun and @soumithchintala 18:35 Neural codecs (SoundStream): compression as the unlock 22:30 Kyutai: open research, small elite teams, moving fast 31:32 Why big labs haven’t “won” voice AI4 34:01 On-device voice: where it works, why compact models matter 46:37 The last mile: real-world robustness, pronunciation, uptime 41:35 Benchmarking voice: why metrics fail, how they actually test 47:03 Cascades vs speech-to-speech: trade-offs + what’s next 54:05 Hardest frontier: noisy rooms, factories, multi-speaker chaos 1:00:50 New languages + dialects: what transfers, what doesn’t 1:02:54 Hardware & compute: why voice isn’t a 10,000-GPU game 1:07:27 What data do you need to train voice models 1:09:02 Deepfakes + privacy: why watermarking isn’t a solution 1:12:30 Voice + vision: multimodality, screen awareness, video+audio 1:14:43 Voice cloning vs voice design: where the market goes 1:16:32 Paris/Europe AI: talent density, underdog energy, what’s next
English
12
24
123
22.8K
Tom Labiausse retweetledi
Alexandre Défossez
Alexandre Défossez@honualx·
🌐 @tom_labiausse and @neilzegh just released Hibiki-Zero, a live translation model with a few seconds latency, and trained without any aligned audio data thanks to reinforcement learning. Code, paper and checkpoints are out 👇 kyutai.org/blog/2026-02-1…
kyutai@kyutai_labs

We're releasing Hibiki-Zero, a new real-time and multilingual speech translation model that can translate 🇫🇷French, 🇪🇸Spanish, 🇵🇹Portuguese and 🇩🇪German to English: accurate, low-latency, high audio quality, with voice transfer. And best of all: open-source.

English
1
6
40
4.3K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
And here's Hibiki-Zero translating Loïs Boisson's quarterfinal win at Roland-Garros 2025! (Video: France TV Sport)
Français
0
3
33
4.4K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
Our new speech translation model, Hibiki-Zero, narrating Léon Marchand's legendary victory at the Paris 2024 Olympics.
Français
1
3
41
2.8K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
We're releasing Hibiki-Zero, a new real-time and multilingual speech translation model that can translate 🇫🇷French, 🇪🇸Spanish, 🇵🇹Portuguese and 🇩🇪German to English: accurate, low-latency, high audio quality, with voice transfer. And best of all: open-source.
English
30
144
914
80.2K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
We hit 1k GitHub stars in 3 days with Pocket TTS, our 100M-parameter TTS with voice cloning that runs on CPU! According to internal estimates, we are on track to reach 182k stars by the end of the year.
kyutai tweet media
English
26
65
655
50K
Tom Labiausse retweetledi
Gradium
Gradium@GradiumAI·
Kyutai keeps shipping state-of-the-art open models pushing the frontier of voice research. This time: the first high-fidelity TTS that runs on CPU. Science and engineering are in Gradium’s DNA, can’t wait to see what the community builds with ultra-compact, on-device voice models.
kyutai@kyutai_labs

We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇

English
0
7
31
3.9K
Tom Labiausse retweetledi
Neil Zeghidour
Neil Zeghidour@neilzegh·
Introducing Pocket-TTS, the first ever TTS model that runs in real-time on CPU (!) with high-fidelity voice cloning. Built on Continuous Audio Language, the newest wave of audio generative models from Kyutai. Under the guidance of Gradium's CSO @honualx who keeps leading audio research at Kyutai, the lab keeps pushing the frontier of research along Gradium's products.
kyutai@kyutai_labs

We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇

English
3
7
46
5.3K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇
English
91
473
3.7K
234.5K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
🏠 Introducing CASA: a new way to input visual information into LLMs. The current default to do that is by inserting image tokens into the text stream, but when using many images in long conversations, this floods the context window and is thus impractical for streaming inputs.🧵
English
2
48
260
22.7K
Tom Labiausse retweetledi
Alexandre Défossez
Alexandre Défossez@honualx·
I've thought a few times about going startup before. With Gradium, the perfect opportunity arised. We're building unmatched speech products, now open to anyone. I will keep an active role leading ambitious research at Kyutai, while devoting most of my time as CSO of Gradium.
Gradium@GradiumAI

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English
6
1
98
10.9K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
Announcing Gradium, a missing link from our research to a broader audience. In the two years since Kyutai launched, we’ve shown how our laser-focused team was able to lead innovation for cutting-edge speech models 🎙️. Our groundbreaking open science contributions helped propel the take-off of Gradium, a startup providing industry-grade building blocks to power the next-generation of natural voice agents. This is an important step towards building a whole and sustainable AI ecosystem in Paris, France, and Europe 🇪🇺. Kyutai is still busy at work 🫶 We are developing world models with @gen_intuition, and on the speech front, we’ll have something big (or should we say small?) to show you soon 🎄.
Gradium@GradiumAI

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English
6
29
253
34.7K
Tom Labiausse retweetledi
Gradium
Gradium@GradiumAI·
Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.
English
79
158
1.1K
468K
Tom Labiausse retweetledi
kyutai
kyutai@kyutai_labs·
🚀New models: ARC-Encoders We introduce a lightweight encoder that compresses context into continuous representations for LLMs, reducing inference cost while preserving performance. Our Adaptable text Representations Compressor, named ARC-Encoder, achieves large efficiency gains through compressing contexts by more than 4 while keeping strong performance across multiple decoders and tasks. We release three pretrained encoders, a fine-tuning dataset as well as a code to pretrain, fine-tune and evaluate any ARC-Encoder! 📄 Paper: github.com/kyutai-labs/AR… 💾Code: github.com/kyutai-labs/AR… 🤗 Models: huggingface.co/collections/ky…
English
2
23
178
14.9K
Tom Labiausse
Tom Labiausse@tom_labiausse·
I’m happy to share that I’ll be attending ICML 2025 in Vancouver next week to present 𝐇𝐢𝐛𝐢𝐤𝐢 [github.com/kyutai-labs/hi…] 🇫🇷🇬🇧 — Kyutai’s real-time and expressive speech translation system. I'll be presenting the poster on Wednesday, July 16 at 4:30PM, feel free to stop by! 💬
Tom Labiausse tweet media
English
3
9
60
5.8K