Kadir Nar

1.2K posts

Kadir Nar

@kadirnardev

AI Research Engineer 🤖 Building Omni & TTS Models 👨‍🍳 at Vyvo

Remote 가입일 Ocak 2025

884 팔로잉1.5K 팔로워

고정된 트윗

Kadir Nar@kadirnardev·20 Haz

I am developing the VoiceHub library to run popular TTS models in a single library. Currently, it supports the Orpheus, Vui, and Dia models. I will add other models(llasa,kokoro, styletts, melotts, f5tts...)

English

8.2K

Kadir Nar@kadirnardev·5 Nis

@blu0190 @oguzergin neden yeni domain alsın basit bir şey için?

Türkçe

Oğuz Ergin@oguzergin·3 Nis

Tantuni endeksi sayfasından bazı görüntüleri paylaşıyorum. tantuni.oguzergin.net

Türkçe

147

16.2K

Kadir Nar@kadirnardev·2 Nis

Github: github.com/Vyvo-Labs/Vyvo…

English

235

Kadir Nar@kadirnardev·2 Nis

The Orpheus and VyvoTTS models now support sglang and vllm libraries. Additionally, the snac model now runs much faster. TTFT: VLLM: 6 ms SgLang: 10 ms

English

2.2K

Kadir Nar@kadirnardev·2 Nis

@ChristophSchuh6 Aratako had 30k hours of emotion data in their dataset. There are very few open-source emotion datasets for English. So making a voice design model can be very difficult. Maybe I can do this by producing synthetic data with echo and qwen3-tts, but it might not sound natural.

English

Christoph Schuhmann@ChristophSchuh6·2 Nis

@kadirnardev En voice design :)

English

Kadir Nar@kadirnardev·1 Nis

I made more Triton-based optimizations to the Snac codec model to further speed up the Orpheus-TTS and VyvoTTS models. Don't forget to star my GitHub repo for more optimizations!

English

1.7K

Kadir Nar@kadirnardev·2 Nis

@ChristophSchuh6 Do you want the English version of the irodori-tts model?

English

196

Christoph Schuhmann@ChristophSchuh6·1 Nis

@kadirnardev I think it would be interesting to train an Echo-like diffusion transformer that ingests SNAC artifact-corrupted audio and outputs high-quality 48 kHz DAC VAE without artifacts. Maybe a 400-million-parameter model or something like that could do it, just fixing artifacts. 🙂

English

Kadir Nar@kadirnardev·1 Nis

@kint0kur Önemli olan VLLM ve SgLang desteği. Bunları eklersen iyi sonuç alırsın. Flow matching tabanlı da model tasarladım ve TTFT değeri 10ms felandı.

Türkçe

alp@kint0kur·1 Nis

@kadirnardev Vyvo veya orpheus farketmez aslında. Single GPU üzerinde paralelde latency çok yükselmeden ne kadar request karşılayabiliyor onu merak ediyorum. Flow matching tabanlı bir model üzerinde çalıştım scale etmek çok zor autoregressive de çalışmıyorlar ttft çok yükseliyor paralelde.

Türkçe

Kadir Nar@kadirnardev·1 Nis

@kint0kur Bu codec modeli. Önemli olan hangi TTS modeli kullandığınız. VyvoTTS için daha önce 100 kullanıcı için TTFT değeri 70-80 ms değerindeydi. Bu optimizasyon ile daha hızlı olacak.

Türkçe

112

alp@kint0kur·1 Nis

@kadirnardev Paralel istek durumunda latencyler ne durumda? Single H100 üzerinde kaç tane paralel requesti karşılayabilir?

Türkçe

Kadir Nar@kadirnardev·1 Nis

github.com/kadirnar/fast-…

ZXX

192

Kadir Nar@kadirnardev·1 Nis

@PuterOnX Yes, and it's not open source

English

Puter@PuterOnX·1 Nis

@kadirnardev Qwen omni just came out right?

English

Kadir Nar@kadirnardev·31 Mar

The Qwen team is no longer releasing their models as open source, and this is a big problem for us. We need small models to train many models like TTS, STT, Omni, and others. Previously there was LLaMA, but they're no longer releasing either. The Qwen team won't be releasing anymore either. Our only hope is the LFM models. Minimax, Kimi, and GLM teams are releasing great models for open source, but none of them release small models. And if these companies also stop releasing open source, it's going to be really bad :(

English

903

74K

Kadir Nar@kadirnardev·1 Nis

@foreignsplat Yes, I saw your new models and I'm very happy about it.

English

iam@foreignsplat·1 Nis

@kadirnardev Liquid.ai

QME

Kadir Nar@kadirnardev·1 Nis

@yukiarimo Should we still use Llama models?

English

Kadir Nar@kadirnardev·1 Nis

These models perform great since they're newly released, but in 3-4 months we'll need better models. For example, if the LFM team doesn't release new small models, would it be okay for you to use old Gemma models? New models should be released constantly. When Gemma was first released, they were great models, but now they're not up to date.

English

575

Kadir Nar@kadirnardev·1 Nis

It says 39 message requests in the chat section, but when I click on it, there are no messages. If I didn't reply to your message, send another message.

English

212

Kadir Nar@kadirnardev·1 Nis

@overlordayn They didn't release the TTS model. Why aren't we training multiple LFM-based models?

English

Narendra Patwardhan@overlordayn·31 Mar

@kadirnardev They have an audio model that does tts and stt both (lfm2.5 audio)

English

100

Kadir Nar@kadirnardev·31 Mar

Should I train TTS with the LFM2.5-350M model or the Omni model?

Liquid AI@liquidai

Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵

English

5.6K

Kadir Nar@kadirnardev·1 Nis

@billyG881 The Neucodec model quality isn't good. I'm thinking of using a better codec.

English

135

billyG88@billyG881·31 Mar

@kadirnardev Hopefully with NEUCODEC codec as its SOTA and has been trained on plenty of multi-lingual data 🤓🤓🤓

English

148

Kadir Nar@kadirnardev·31 Mar

@jesujopi3D Data quality is bad.

English

Shorpy🪼@jesujopi3D·31 Mar

@kadirnardev Would this one work? datacollective.mozillafoundation.org/datasets/cmn4z…

English

Kadir Nar@kadirnardev·31 Mar

@jesujopi3D If you have an open source dataset, I can train it.

English

135

Shorpy🪼@jesujopi3D·31 Mar

@kadirnardev Are you considering training any TTS model with multilingual datasets or Spanish data? I’ve been using Queen 3 TTS, but the VRAM consumption I get even with quantized models is a bit high for a 12GB card like mine…

English

140

Kadir Nar@kadirnardev·31 Mar

@WaelShaikh I had trained LFM models before, training them again could be good. x.com/kadirnardev/st…

Kadir Nar@kadirnardev

We have released our LFM2-350M based TTS model as open source 🚀 We have also released many different FT models. GPU Platform: @hyperbolic_labs Data: Emilia + Emilia Yodas(EN) LLM Model: LFM2-350M @liquidai Disk and Space: @huggingface I'm very happy to have released this model as open source. Many thanks to @VyvoSmartChain #opensource #speech #tts #huggingface #lfm #gpu

English

282

Wael Shaikh@WaelShaikh·31 Mar

@kadirnardev Definitely on the 350M model. Would love to see how it performs. LFM makes some of the fastest LLMs, I wonder if the speedup would even benefit the TTS.

English

196

Kadir Nar@kadirnardev·31 Mar

@liquidai Thank you for publishing it as open source❤️

English

821