🇳🇵 kala-tts : नेपालमै बनेको, पहिलो आफ्नै देवनागरी G2P सहितको खुला-स्रोत नेपाली VITS आवाज।
cloud छैन · तपाईंकै CPU मा चल्छ।
pip install kala-tts · 🎧 tts.ampixa.com/kala
अन्दाज गर्नुहोस्, हामी केमा काम गरिरहेका छौँ?
hint : यो लिम्बू(ᤕᤠᤰᤌᤢᤱ ᤐᤠᤴ) लिपि, अर्थात् सिरिजंगा, का लागि बनाइएको Validation Dashboard हो।
PDF बाट निकालिएको image मा bounding box लगाइएको छ। दायाँपट्टि देखिएका blocks मा cropped image र त्यसको Noto Sans Devanagari मा equivalent Unicode राखिएको छ।
Well, thanks for the suggestion but there is a better path forward.
- download all the CC0 videos like pratinidhi sabha sessions and on youtube cc0 videos
- voice activity detection code to identify where people start speaking along with diarization models to identify multiple speakers
- run a noise artifact remover like noisereduce, deepfilternet or demucs etc to remove background noise
- Build a ASR(speech recognition) model over it
- emotion labelling models like emotion_top
- human listening on samples
we are doing that rn. The hardest part is ASR...
🇳🇵 kala-tts : नेपालमै बनेको, पहिलो आफ्नै देवनागरी G2P सहितको खुला-स्रोत नेपाली VITS आवाज।
cloud छैन · तपाईंकै CPU मा चल्छ।
pip install kala-tts · 🎧 tts.ampixa.com/kala
@Communist977@aabhpsy IndicVoices has 23k hours of speech text pairs.
Emilia is 46k hours
Mls is 44k hours
Nepali doesn't have that kind of speech -> text pairs
Running ASR is also not viable with CER of around 12 to 18 % and hallucinations on open source ASRs like whisper for nepali
@kingofknowwhere sure, please keep looking. we plan to cover the whole 18 languages. Maithili ra nepali ko root sajilo bhayera yo sajilo huncha nai. G2P banauna parcha. If you know a linguist or prof who is fluent in maithali. please let us know
@pranayaratnasha Well, can you defer some time for the extensive test? there is a better model dropping soon based on updated/evolved styleTTS2 architecture.
@__ampixa__ great love the initial demos seen here going to give it an extensive test to see how far it will reach. Great going on tihs. May be soon we will have a native speaking assitant in our phones rather than english speaking ones. Kudos looking forward to future updates.
@aabhpsy Because of scarcity of Nepali data you will get Hindi prosodies with Nepali speech. The best one so far to start working on is dots.tts by rednote social media team.
Any already available options even on GPU?
Can we continue work from what you have done so far and use GPUs to get production grade cloned audio in natural Nepali like we speak?
I don't understand all but I can dig in and learn from 0 but since you guys have pioneered, it would be nice to get insights.
Problem is gathering thousands of hours of tts data, diarize them, run background noise remover models... We first plan to at least have 5000 hour (silver + gold) Nepali speech + text pair db and we are 30% there. We will open source that too.. So maybe within this year we will have multishot voice cloner atleast... But the goal is naturalness and real time inference on cpu, with limited voice
How Kala reads Nepali text:
नेपालमा → /ne.pal.ma/
रामले → /ram.le/ (not /raː.mə.leː/ like eSpeak)
Akshara parse → schwa-deletion rules → IPA. No black-box character embeddings.
Full frontend: github.com/Ampixa/nepa-ne…