Tom Labiausse

24 posts

Tom Labiausse

@tom_labiausse

ML Research @kyutai_labs

Paris Katılım Temmuz 2024

3 Takip Edilen200 Takipçiler

Tom Labiausse retweetledi

kyutai@kyutai_labs·6 May

Three of our papers got accepted at ICML and one at CVPR this year 🎉 We will have researchers on-site for both conferences, so come talk to us if you want to learn more about Kyutai! 👁️ MoshiVis (CVPR’26) → Vision Speech Models: A data- and training- efficient pipeline for omni models built on top of Moshi 🧠 MoshiRAG (ICML’26) → Making speech-to-speech models smarter with the power of RAG and minimal latency 🗣️Hibiki-Zero (ICML’26) → Streaming speech-to-speech translation without aligned data leveraging GRPO ⌛ Kairos (ICML’26) → Recency bias is real, even for LLMs. More details in a future post! #ICML2026 #CVPR2026

English

152

9.1K

Tom Labiausse retweetledi

kyutai@kyutai_labs·30 Nis

Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵

English

482

51.6K

Tom Labiausse retweetledi

Neil Zeghidour@neilzegh·16 Nis

Gradium builds models, not orchestration or voice agents. But to really evaluate the conversational experience around our models, you need to see them inside an actual agent. That’s why we built Gradbot internally, to spin up a POC in minutes before a sales call. Now we’re open-sourcing it for anyone to experiment with and have fun.

Gradium@GradiumAI

Today we're open-sourcing Gradbot. The framework we built internally at Gradium to prototype voice agents. Idea to a working prototype in around 50 lines of code

English

4.8K

Tom Labiausse retweetledi

kyutai@kyutai_labs·15 Nis

We're releasing OVIE, a novel view generation model trained entirely on single images. No multi-view datasets needed. Given a single image, it generates novel views of any scene in real time, running orders of magnitude faster than competing approaches.

English

210

30.7K

Tom Labiausse retweetledi

Matt Turck@mattturck·19 Şub

Voice used to be AI’s forgotten modality - now it's having its big moment: rapid innovation, big funding rounds, major agentic applications My conversation with @neilzegh, top AI researcher in the field (@GoogleDeepMind, @Meta, @kyutai_labs) and now CEO of @GradiumAI This is a reference episode on all things voice AI 🔥 00:00 Intro 01:21 Voice AI’s big moment, and why we’re still early 03:34 Why voice lagged behind text/image/video 06:06 The convergence era: transformers for every modality 07:40 Beyond Her: always-on assistants, wake words, voice-first devices 11:01 Voice vs text: where voice fits (even for coding) 12:56 Neil’s origin story: from finance to machine learning, with help from @ylecun and @soumithchintala 18:35 Neural codecs (SoundStream): compression as the unlock 22:30 Kyutai: open research, small elite teams, moving fast 31:32 Why big labs haven’t “won” voice AI4 34:01 On-device voice: where it works, why compact models matter 46:37 The last mile: real-world robustness, pronunciation, uptime 41:35 Benchmarking voice: why metrics fail, how they actually test 47:03 Cascades vs speech-to-speech: trade-offs + what’s next 54:05 Hardest frontier: noisy rooms, factories, multi-speaker chaos 1:00:50 New languages + dialects: what transfers, what doesn’t 1:02:54 Hardware & compute: why voice isn’t a 10,000-GPU game 1:07:27 What data do you need to train voice models 1:09:02 Deepfakes + privacy: why watermarking isn’t a solution 1:12:30 Voice + vision: multimodality, screen awareness, video+audio 1:14:43 Voice cloning vs voice design: where the market goes 1:16:32 Paris/Europe AI: talent density, underdog energy, what’s next

English

123

22.8K

Tom Labiausse retweetledi

Alexandre Défossez@honualx·13 Şub

🌐 @tom_labiausse and @neilzegh just released Hibiki-Zero, a live translation model with a few seconds latency, and trained without any aligned audio data thanks to reinforcement learning. Code, paper and checkpoints are out 👇 kyutai.org/blog/2026-02-1…

kyutai@kyutai_labs

We're releasing Hibiki-Zero, a new real-time and multilingual speech translation model that can translate 🇫🇷French, 🇪🇸Spanish, 🇵🇹Portuguese and 🇩🇪German to English: accurate, low-latency, high audio quality, with voice transfer. And best of all: open-source.

English

4.3K

Tom Labiausse retweetledi

kyutai@kyutai_labs·13 Şub

And here's Hibiki-Zero translating Loïs Boisson's quarterfinal win at Roland-Garros 2025! (Video: France TV Sport)

Français

4.4K

Tom Labiausse retweetledi

kyutai@kyutai_labs·13 Şub

Our new speech translation model, Hibiki-Zero, narrating Léon Marchand's legendary victory at the Paris 2024 Olympics.

Français

2.8K

Tom Labiausse retweetledi

kyutai@kyutai_labs·12 Şub

English

144

914

80.2K

Tom Labiausse retweetledi

kyutai@kyutai_labs·16 Oca

We hit 1k GitHub stars in 3 days with Pocket TTS, our 100M-parameter TTS with voice cloning that runs on CPU! According to internal estimates, we are on track to reach 182k stars by the end of the year.

English

655

50K

Tom Labiausse retweetledi

Gradium@GradiumAI·14 Oca

Kyutai keeps shipping state-of-the-art open models pushing the frontier of voice research. This time: the first high-fidelity TTS that runs on CPU. Science and engineering are in Gradium’s DNA, can’t wait to see what the community builds with ultra-compact, on-device voice models.

kyutai@kyutai_labs

We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇

English

3.9K

Tom Labiausse retweetledi

Neil Zeghidour@neilzegh·13 Oca

Introducing Pocket-TTS, the first ever TTS model that runs in real-time on CPU (!) with high-fidelity voice cloning. Built on Continuous Audio Language, the newest wave of audio generative models from Kyutai. Under the guidance of Gradium's CSO @honualx who keeps leading audio research at Kyutai, the lab keeps pushing the frontier of research along Gradium's products.

kyutai@kyutai_labs

English

5.3K

Tom Labiausse retweetledi

kyutai@kyutai_labs·13 Oca

English

473

3.7K

234.5K

Tom Labiausse retweetledi

kyutai@kyutai_labs·23 Ara

🏠 Introducing CASA: a new way to input visual information into LLMs. The current default to do that is by inserting image tokens into the text stream, but when using many images in long conversations, this floods the context window and is thus impractical for streaming inputs.🧵

English

260

22.7K

Tom Labiausse retweetledi

Neil Zeghidour@neilzegh·2 Ara

Announcing Gradium. After 10 years of pushing audio research at Meta, Google and Kyutai, I'm joining the start-up arena with my day 1s to take our models from the lab to every voice product out there. Game on.

Gradium@GradiumAI

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English

263

58.6K

Tom Labiausse retweetledi

Alexandre Défossez@honualx·2 Ara

I've thought a few times about going startup before. With Gradium, the perfect opportunity arised. We're building unmatched speech products, now open to anyone. I will keep an active role leading ambitious research at Kyutai, while devoting most of my time as CSO of Gradium.

Gradium@GradiumAI

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English

10.9K

Tom Labiausse retweetledi

kyutai@kyutai_labs·2 Ara

Announcing Gradium, a missing link from our research to a broader audience. In the two years since Kyutai launched, we’ve shown how our laser-focused team was able to lead innovation for cutting-edge speech models 🎙️. Our groundbreaking open science contributions helped propel the take-off of Gradium, a startup providing industry-grade building blocks to power the next-generation of natural voice agents. This is an important step towards building a whole and sustainable AI ecosystem in Paris, France, and Europe 🇪🇺. Kyutai is still busy at work 🫶 We are developing world models with @gen_intuition, and on the speech front, we’ll have something big (or should we say small?) to show you soon 🎄.

Gradium@GradiumAI

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English

253

34.7K

Tom Labiausse retweetledi

Gradium@GradiumAI·2 Ara

Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.

English

158

1.1K

468K

Tom Labiausse retweetledi

kyutai@kyutai_labs·17 Eki

🚀New models: ARC-Encoders We introduce a lightweight encoder that compresses context into continuous representations for LLMs, reducing inference cost while preserving performance. Our Adaptable text Representations Compressor, named ARC-Encoder, achieves large efficiency gains through compressing contexts by more than 4 while keeping strong performance across multiple decoders and tasks. We release three pretrained encoders, a fine-tuning dataset as well as a code to pretrain, fine-tune and evaluate any ARC-Encoder! 📄 Paper: github.com/kyutai-labs/AR… 💾Code: github.com/kyutai-labs/AR… 🤗 Models: huggingface.co/collections/ky…

English

178

14.9K

Tom Labiausse@tom_labiausse·11 Tem

I’m happy to share that I’ll be attending ICML 2025 in Vancouver next week to present 𝐇𝐢𝐛𝐢𝐤𝐢 [github.com/kyutai-labs/hi…] 🇫🇷🇬🇧 — Kyutai’s real-time and expressive speech translation system. I'll be presenting the poster on Wednesday, July 16 at 4:30PM, feel free to stop by! 💬

English

5.8K

Keşfet

@neilzegh @GoogleDeepMind @Meta @kyutai_labs @GradiumAI @ylecun @soumithchintala @honualx