Kadir Nar

1.2K posts

Kadir Nar banner
Kadir Nar

Kadir Nar

@kadirnardev

AI Research Engineer 🤖 Building Omni & TTS Models 👨‍🍳 at Vyvo

Remote 가입일 Ocak 2025
884 팔로잉1.5K 팔로워
고정된 트윗
Kadir Nar
Kadir Nar@kadirnardev·
I am developing the VoiceHub library to run popular TTS models in a single library. Currently, it supports the Orpheus, Vui, and Dia models. I will add other models(llasa,kokoro, styletts, melotts, f5tts...)
Kadir Nar tweet media
English
3
4
54
8.2K
Kadir Nar
Kadir Nar@kadirnardev·
The Orpheus and VyvoTTS models now support sglang and vllm libraries. Additionally, the snac model now runs much faster. TTFT: VLLM: 6 ms SgLang: 10 ms
Kadir Nar tweet media
English
3
3
59
2.2K
Kadir Nar
Kadir Nar@kadirnardev·
@ChristophSchuh6 Aratako had 30k hours of emotion data in their dataset. There are very few open-source emotion datasets for English. So making a voice design model can be very difficult. Maybe I can do this by producing synthetic data with echo and qwen3-tts, but it might not sound natural.
English
0
0
0
45
Kadir Nar
Kadir Nar@kadirnardev·
I made more Triton-based optimizations to the Snac codec model to further speed up the Orpheus-TTS and VyvoTTS models. Don't forget to star my GitHub repo for more optimizations!
Kadir Nar tweet media
English
3
1
47
1.7K
Christoph Schuhmann
Christoph Schuhmann@ChristophSchuh6·
@kadirnardev I think it would be interesting to train an Echo-like diffusion transformer that ingests SNAC artifact-corrupted audio and outputs high-quality 48 kHz DAC VAE without artifacts. Maybe a 400-million-parameter model or something like that could do it, just fixing artifacts. 🙂
English
1
0
0
63
Kadir Nar
Kadir Nar@kadirnardev·
@kint0kur Önemli olan VLLM ve SgLang desteği. Bunları eklersen iyi sonuç alırsın. Flow matching tabanlı da model tasarladım ve TTFT değeri 10ms felandı.
Türkçe
0
0
0
58
alp
alp@kint0kur·
@kadirnardev Vyvo veya orpheus farketmez aslında. Single GPU üzerinde paralelde latency çok yükselmeden ne kadar request karşılayabiliyor onu merak ediyorum. Flow matching tabanlı bir model üzerinde çalıştım scale etmek çok zor autoregressive de çalışmıyorlar ttft çok yükseliyor paralelde.
Türkçe
1
0
0
51
Kadir Nar
Kadir Nar@kadirnardev·
@kint0kur Bu codec modeli. Önemli olan hangi TTS modeli kullandığınız. VyvoTTS için daha önce 100 kullanıcı için TTFT değeri 70-80 ms değerindeydi. Bu optimizasyon ile daha hızlı olacak.
Türkçe
1
0
0
112
alp
alp@kint0kur·
@kadirnardev Paralel istek durumunda latencyler ne durumda? Single H100 üzerinde kaç tane paralel requesti karşılayabilir?
Türkçe
1
0
0
78
Kadir Nar
Kadir Nar@kadirnardev·
The Qwen team is no longer releasing their models as open source, and this is a big problem for us. We need small models to train many models like TTS, STT, Omni, and others. Previously there was LLaMA, but they're no longer releasing either. The Qwen team won't be releasing anymore either. Our only hope is the LFM models. Minimax, Kimi, and GLM teams are releasing great models for open source, but none of them release small models. And if these companies also stop releasing open source, it's going to be really bad :(
English
72
52
903
74K
Kadir Nar
Kadir Nar@kadirnardev·
@foreignsplat Yes, I saw your new models and I'm very happy about it.
English
0
0
1
45
Kadir Nar
Kadir Nar@kadirnardev·
These models perform great since they're newly released, but in 3-4 months we'll need better models. For example, if the LFM team doesn't release new small models, would it be okay for you to use old Gemma models? New models should be released constantly. When Gemma was first released, they were great models, but now they're not up to date.
Kadir Nar tweet media
English
1
0
0
575
Kadir Nar
Kadir Nar@kadirnardev·
It says 39 message requests in the chat section, but when I click on it, there are no messages. If I didn't reply to your message, send another message.
Kadir Nar tweet media
English
0
0
0
212
Kadir Nar
Kadir Nar@kadirnardev·
@overlordayn They didn't release the TTS model. Why aren't we training multiple LFM-based models?
English
1
0
0
91
Kadir Nar
Kadir Nar@kadirnardev·
@billyG881 The Neucodec model quality isn't good. I'm thinking of using a better codec.
English
1
0
0
135
billyG88
billyG88@billyG881·
@kadirnardev Hopefully with NEUCODEC codec as its SOTA and has been trained on plenty of multi-lingual data 🤓🤓🤓
English
2
0
1
148
Kadir Nar
Kadir Nar@kadirnardev·
@jesujopi3D If you have an open source dataset, I can train it.
English
1
0
0
135
Shorpy🪼
Shorpy🪼@jesujopi3D·
@kadirnardev Are you considering training any TTS model with multilingual datasets or Spanish data? I’ve been using Queen 3 TTS, but the VRAM consumption I get even with quantized models is a bit high for a 12GB card like mine…
English
1
0
1
140
Kadir Nar
Kadir Nar@kadirnardev·
@WaelShaikh I had trained LFM models before, training them again could be good. x.com/kadirnardev/st…
Kadir Nar@kadirnardev

We have released our LFM2-350M based TTS model as open source 🚀 We have also released many different FT models. GPU Platform: @hyperbolic_labs Data: Emilia + Emilia Yodas(EN) LLM Model: LFM2-350M @liquidai Disk and Space: @huggingface I'm very happy to have released this model as open source. Many thanks to @VyvoSmartChain #opensource #speech #tts #huggingface #lfm #gpu

English
0
0
1
282
Wael Shaikh
Wael Shaikh@WaelShaikh·
@kadirnardev Definitely on the 350M model. Would love to see how it performs. LFM makes some of the fastest LLMs, I wonder if the speedup would even benefit the TTS.
English
1
0
1
196
Kadir Nar
Kadir Nar@kadirnardev·
@liquidai Thank you for publishing it as open source❤️
English
0
0
2
821
Liquid AI
Liquid AI@liquidai·
Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵
Liquid AI tweet media
English
80
273
2.3K
341.5K