Eustache Le Bihan

198 posts

Eustache Le Bihan

@eustachelb

Speech & Audio @ Hugging Face 🤗

Katılım Ağustos 2023

322 Takip Edilen756 Takipçiler

Eustache Le Bihan@eustachelb·3d

@paulcx @Julianfmack @huggingface @Tu7uruu @TheOneKloud it does! Check out the transformers doc #non-english-transcription" target="_blank" rel="nofollow noopener">huggingface.co/docs/transform…

English

Paul Chen@paulcx·3d

@Julianfmack @huggingface @Tu7uruu @eustachelb @TheOneKloud Does it support multi-languages?

English

Julian Mack@Julianfmack·4d

Happy to share what I've been working on recently: today we release Cohere Transcribe, a state-of-the-art speech recognition model that beats both commercial and open-source models to land at #1 on the Open ASR Leaderboard!

English

3.7K

Eustache Le Bihan@eustachelb·3d

just `uv pip install -U transformers` and play with it!

English

157

Eustache Le Bihan@eustachelb·3d

Been cooking again for you guys! Cohere ASR model, topping the open ASR leaderboard, is supported day 0 in transformers 🤠

English

2.1K

Eustache Le Bihan@eustachelb·18 Şub

@neilzegh @lmazare 🐐

QME

Neil Zeghidour@neilzegh·17 Şub

When @lmazare wants some fresh air from optimizing API speech models, he optimizes on-device ones.

Laurent Mazare@lmazare

A bit late to the party but here is my take at running Kyutai's Pocket TTS 🗣️in the browser. Rust compiled to wasm, single threaded CPU only, using simd128, and it's running in real-time on my Pixel 8a without quantization. laurentmazare.github.io/pocket-tts/

English

Eustache Le Bihan retweetledi

Mistral AI for Developers@MistralDevs·18 Şub

In addition to vLLM, the Hugging Face Transformers library now supports Voxtral Realtime. Thanks to @eustachelb for leading the integration. We’ve been amazed by how quickly the OSS community has shipped implementations of Voxtral Realtime across platforms, backends, and use cases. We expect Transformers support to drive even wider adoption, especially across fine-tuning and quantization libraries. We want to thank Mergen Nachin, Salvatore Sanfilippo, TrevorS, Shreyas Karnik, Awni Hannun, and @limzba for their early contributions. Full list of community integrations: #community-contributions-untested" target="_blank" rel="nofollow noopener">huggingface.co/mistralai/Voxt…

English

5.1K

Eustache Le Bihan retweetledi

keveman@keveman·13 Şub

Really grateful to @eustachelb for helping me land this in the transformers library and @Tu7uruu for getting the model up on OpenASR leaderboard.

keveman@keveman

New speech recognition models are announced on X almost every day nowadays. But not everyday you see a 250M parameter model beat the 1.5GB Whisper Large v3. Today we are announcing Moonshine Streaming. HF Link: huggingface.co/UsefulSensors/… Paper draft: download.moonshine.ai/docs/moonshine…

English

288

Eustache Le Bihan@eustachelb·6 Şub

@antirez @julien_c @1littlecoder @MistralAI indeed! one might even notice that STFT frames are shifted by 40ms (using non-streaming inference as a ref)

English

antirez@antirez·6 Şub

@eustachelb @julien_c @1littlecoder @MistralAI You can find all the architectural detains into my voxtral.c GitHub repository if it can help in some way. There were details disseminated among mistral-common and vLLM that I reconstructed, also note that the exact FFT points are crucial for the model to work well.

English

antirez@antirez·6 Şub

Yesterday @MistralAI released an open weights transcription model able to work in real time, Voxtral Mini 4B. Today, following the Whisper.cpp lesson, here is a C inference pipeline ready to use as a library, I hope you'll enjoy it: github.com/antirez/voxtra…

English

979

54.5K

Eustache Le Bihan@eustachelb·6 Şub

@julien_c @antirez @1littlecoder @MistralAI soon in transformers indeed! thinking about a quick thread to detail a bit the architecture since their paper will not come right now

English

Julien Chaumond@julien_c·6 Şub

@antirez @1littlecoder @MistralAI @eustachelb (With the help of the mistral team)

English

Eustache Le Bihan@eustachelb·6 Şub

really nothing difficult, the vllm implem is unnecessarily complicated (since they don't use a conv cache, they have to provide more context, meaning sending longer overlapping audio chunks). The main complication comes from the STFT computation and where to cut when sending audio chunk

English

antirez@antirez·6 Şub

Yes, for the "easy work" part. The model inference was only half-specified via the vLLM nightly + mistral common Python stuff. I needed to find my way with the inference using Codex 5.2 xhigh, only way to understand enough details. Then, with MODEL.md, Claude did it mostly.

English

1.8K

Eustache Le Bihan retweetledi

Seb Johnson@SebJohnsonUK·29 Oca

The founders of @huggingface after turning down $500m from Nvidia so they can stay independent

English

107

14.7K

Eustache Le Bihan@eustachelb·5 Şub

cooking for you guys 🧑‍🍳 really really soon in transformers

Mistral AI@MistralAI

Introducing Voxtral Transcribe 2, next-gen speech-to-text models by @MistralAI. State-of-the-art transcription, speaker diarization, sub-200ms real-time latency. Details in 🧵

English

7.8K

Eustache Le Bihan retweetledi

Arthur Zucker@art_zucker·26 Oca

Today is a big day, transformers v5 is FINALLY out!!

English

666

35.2K

Eustache Le Bihan retweetledi

ben@benhylak·26 Oca

voice ai is about to have a moment i don't think people get how rapidly the experience changes as latency approaches 0

Hugging Models@HuggingModels

NVIDIA just dropped PersonaPlex-7B 🤯 A full-duplex voice model that listens and talks at the same time. No pauses. No turn-taking. Real conversation. 100% open source. Free. Voice AI just leveled up. huggingface.co/nvidia/persona…

English

122

3.2K

499.7K

Eustache Le Bihan retweetledi

Omar Sanseviero@osanseviero·19 Ara

Introducing our latest open model: MedASR 🔬Speech to text model 🏥for healthcare-based voice applications 🤗available in Hugging Face ⚡️run with transformers Download right now huggingface.co/google/medasr

English

137

1.2K

81.4K

Eustache Le Bihan@eustachelb·18 Ara

@kadirnardev ai.meta.com/research/publi…

QME

Kadir Nar@kadirnardev·18 Ara

@eustachelb Paper?

English

138

Eustache Le Bihan@eustachelb·18 Ara

Boom! Big release by Meta in the audio game: Sam Audio and Perception Encoder Audiovisual. "The core innovation in SAM Audio is the Perception Encoder Audiovisual engine"... and it's supported day 0 in transformers!

English

683

Eustache Le Bihan@eustachelb·18 Ara

Check out the models huggingface.co/collections/fa…

English

124

Eustache Le Bihan@eustachelb·2 Ara

Check out the release blog post! huggingface.co/blog/transform…

English

Eustache Le Bihan@eustachelb·2 Ara

Without it, using a new model often means diving into a complex research codebase, deciphering experimental features, and dealing with huge engineering overhead. Most people simply give up, and end up paying for a closed solution. transformers solves this. Learn one paradigm. Trust the integrations. Build with any open model. Move faster, not slower. It is the source of truth for open models, and I’m incredibly proud to help build it. Here’s to open source, Hugging Face, and the entire community. ❤️‍🔥

English

Eustache Le Bihan@eustachelb·2 Ara

transformers v5 is out! 3 mill daily downloads, one of the core foundations of the open AI ecosystem. 4y after v4, this milestone reminds us why we build OS AI. It’s genuinely fun to work on, but as technicians, we need to think about the impact of the tools we’re creating.

English

144

Keşfet

@paulcx @Julianfmack @huggingface @Tu7uruu @TheOneKloud @neilzegh @lmazare @limzba