Dan Lyth

123 posts

Dan Lyth

@danlyth

Research engineer at @sesame. Previously leading speech research at @StabilityAI and @RockstarGames.

شامل ہوئے Aralık 2022

310 فالونگ680 فالوورز

Dan Lyth@danlyth·23 Nis

@dscape Ah, sorry, fixed now

English

Nuno Job@dscape·23 Nis

@danlyth wanna come speak in an ai conference about acoustic tokens?

English

Dan Lyth@danlyth·23 Nis

@dscape Hey, DMs are open :)

English

Dan Lyth@danlyth·17 Mar

@sesame @_apkumar youtu.be/bTcpNQH8ViQ?si…

YouTube

QME

427

Dan Lyth@danlyth·17 Mar

We're a small team @sesame, but looking for great people to join. Check out this podcast with @_apkumar to get a feel for how we work. More info at sesame.com/team

Anjney Midha@AnjneyMidha

the core research team behind the sesame voice model is <9 ppl as @_apkumar walked through in our latest 1.5 hr podcast, talent density beats team size most days

English

581

Dan Lyth@danlyth·14 Mar

@realmrfakename @sesame huggingface.co/sesame/csm-1b huggingface.co/spaces/sesame/… github.com/SesameAILabs/c… ❤️

QME

245

Dan Lyth@danlyth·28 Şub

@realmrfakename @sesame Yeah, we’re open-sourcing one of the base models (not fine-tuned with the voices you hear in the demo) in the next two weeks. Will be here: github.com/SesameAILabs/c…

English

Dan Lyth@danlyth·27 Şub

We're just getting started at @sesame. Check out the demo here: sesame.com/voicedemo

Brendan Iribe@brendaniribe

We’re exploring a future where the computer isn’t just a tool—it’s a partner with a truly natural voice and personality. No big claims, just early work we’re excited to share. @sesame

English

1.7K

Dan Lyth@danlyth·28 Şub

@reach_vb @realmrfakename @sesame Hey, sounds good vb, will be in touch.

English

199

Vaibhav (VB) Srivastav@reach_vb·28 Şub

@danlyth @realmrfakename @sesame hey dan - vb from hugging face this side - would love to help w/ the release. I can't seem to access our shared slack anymore but my DMs are open! or vb[at]hf[dot]co

English

301

Dan Lyth@danlyth·27 Şub

Kind words from @seanhollister 🙏. But we've still got a long way to go...

The Verge@verge

Sesame is the first voice assistant I’ve ever wanted to talk to more than once theverge.com/news/621022/se…

English

486

Dan Lyth ری ٹویٹ کیا

Brendan Iribe@brendaniribe·27 Şub

And we’re building hardware.

English

105

7.9K

Dan Lyth@danlyth·27 Şub

Nice overview of some of the things we've been working on @sesame. Always a pleasure working with @justLV.

Justin Alvey@justLV

Excited to share a peek of what I’ve been working on We @sesame believe voice is key to unlocking a future where computers are lifelike Here’s an early preview you can try! 👇 We’ll be open sourcing a model, and yes… we’re building hardware! 🧵

English

402

Dan Lyth@danlyth·27 Şub

Delighted to share a little glimpse of what we've been working on @sesame

Sesame@sesame

At Sesame, we believe in a future where computers are lifelike. Today we are unveiling an early glimpse of our expressive voice technology, highlighting our focus on lifelike interactions and our vision for all-day wearable voice companions. sesame.com/voicedemo

English

407

Dan Lyth@danlyth·18 May

@FluorescentGrey @stableaudio @StabilityAI @iScienceLuvr @jordiponsdotme @ednewtonrex @harmonai_org @zqevans @chrlaf

QAM

110

Robbie Martin@FluorescentGrey·18 May

@stableaudio @StabilityAI cc: @iScienceLuvr @jordiponsdotme @ednewtonrex @harmonai_org @danlyth sorry for tagging all of you I just don't seem tobe getting anywhere with the support ticket system through the website

English

628

Robbie Martin@FluorescentGrey·17 May

@stableaudio been trying to use stable audio and in the last 24 hours only 50% of my attempts to generate audio actually don't time out, is this a system wide issue or something with my account?

English

753

Dan Lyth@danlyth·18 Nis

@Dorialexander @pleiasfr @huggingface This is awesome. Do you know approximately how many hours this comes to?

English

119

Alexander Doria@Dorialexander·18 Nis

Big announcement: @pleiasfr releases a massive open corpus of 2 million Youtube videos in Creative Commons (CC-By) on @huggingface. Youtube-Commons features 30 billion words of audio transcriptions in multiple languages, and soon other modalities huggingface.co/datasets/PleIA…

English

124

556

87.8K

Dan Lyth@danlyth·10 Nis

Excellent work by @sanchitgandhi99 and @yoachlacombe reproducing the text-description-to-speech model I developed while at @StabilityAI 👏❤️

Sanchit Gandhi@sanchitgandhi99

Introducing Parler-TTS: an inference and training library for high-quality, controllable text-to-speech (TTS) models 🗣️ To fuel the development of open-source TTS research, we are open-sourcing all datasets, training code and our first iteration checkpoint: Parler-TTS Mini v0.1

English

14.5K

Dan Lyth@danlyth·29 Şub

@HubertSiuzdak Congrats, sounds great! Any code/paper?

English

171

Hubert Siuzdak@HubertSiuzdak·28 Şub

SNAC encodes audio into hierarchical tokens, similar to SoundStream, EnCodec, and DAC. It introduces a simple change: coarse tokens are sampled less frequently, covering a broader time span. It is designed mainly for language models to accurately capture long-form audio with a consistent structure

English

621

Hubert Siuzdak@HubertSiuzdak·28 Şub

recently I've been experimenting with audio compression & vector quantization and I'm happy to present Multi-Scale Neural Audio Codec (SNAC), which can compress audio below 2 Kb/s with decent quality 🎧 Listen to the samples and see how it compares to the state of the art:

English

Dan Lyth@danlyth·16 Şub

@erogol I was wondering the same thing, there’s not a lot of detail on that.

English

erogol@erogol·16 Şub

@danlyth one thing unclear to me how they expand BPE before decoding. be cause otherwise they need to learn to alight it with the output audio frames. do you see how?

English

149

Dan Lyth@danlyth·15 Şub

Moving beyond naturalness and WER, they propose a set of sentences that test the model’s ability to deal with compound nouns, emotions, foreign words, paralinguistics (e.g. whispering if the text requires it) etc. etc. The full test set is included in the appendix. 👏 2/7

English

646

Dan Lyth@danlyth·15 Şub

There are a bunch of other interesting elements to this work, and it’s worth a read. Plenty of examples on the demo site too. Nice work Mateusz Łajszczak, @guillecambara, Yang Li, and all the other contributors. arxiv.org/abs/2402.08093

English

374

Dan Lyth@danlyth·15 Şub

The speech “de-tokenizer” (or decoder) is a convolutional model that’s streamable and 3x faster than their diffusion-based baseline (and also sounds better). It’s built around BigVGAN. 6/7

English

517

دریافت کریں

@dscape @sesame @_apkumar @realmrfakename @reach_vb @seanhollister @justLV @FluorescentGrey