Ampixa Labs (@__ampixa__) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

🇳🇵 kala-tts : नेपालमै बनेको, पहिलो आफ्नै देवनागरी G2P सहितको खुला-स्रोत नेपाली VITS आवाज। cloud छैन · तपाईंकै CPU मा चल्छ। pip install kala-tts · 🎧 tts.ampixa.com/kala

NE

10

21

126

34K

Ampixa Labs@__ampixa__·2d

अन्दाज गर्नुहोस्, हामी केमा काम गरिरहेका छौँ? hint : यो लिम्बू(ᤕᤠᤰᤌᤢᤱ ᤐᤠᤴ) लिपि, अर्थात् सिरिजंगा, का लागि बनाइएको Validation Dashboard हो। PDF बाट निकालिएको image मा bounding box लगाइएको छ। दायाँपट्टि देखिएका blocks मा cropped image र त्यसको Noto Sans Devanagari मा equivalent Unicode राखिएको छ।

0

3

12

2.5K

Ampixa Labs@__ampixa__·3d

@AsimPaudel4 Better model dropping soon with good prosody. you can literally hear the model breathing :)

English

1

0

2

39

Asim Paudel@AsimPaudel4·3d

I needed this so much. Thanks to @__ampixa__

Ampixa Labs@__ampixa__

🇳🇵 kala-tts : नेपालमै बनेको, पहिलो आफ्नै देवनागरी G2P सहितको खुला-स्रोत नेपाली VITS आवाज। cloud छैन · तपाईंकै CPU मा चल्छ। pip install kala-tts · 🎧 tts.ampixa.com/kala

English

1

0

3

161

Ampixa Labs@__ampixa__·3d

Well, thanks for the suggestion but there is a better path forward. - download all the CC0 videos like pratinidhi sabha sessions and on youtube cc0 videos - voice activity detection code to identify where people start speaking along with diarization models to identify multiple speakers - run a noise artifact remover like noisereduce, deepfilternet or demucs etc to remove background noise - Build a ASR(speech recognition) model over it - emotion labelling models like emotion_top - human listening on samples we are doing that rn. The hardest part is ASR...

English

1

0

1

12

🔻☭★@Communist977·3d

@__ampixa__ @aabhpsy You can hire people to record voice and train with them. I'm sure Nepalese will volunteer to do it for free.

English

1

0

1

32

Ampixa Labs@__ampixa__·3d

🇳🇵 kala-tts : नेपालमै बनेको, पहिलो आफ्नै देवनागरी G2P सहितको खुला-स्रोत नेपाली VITS आवाज। cloud छैन · तपाईंकै CPU मा चल्छ। pip install kala-tts · 🎧 tts.ampixa.com/kala

NE

10

21

126

34K

Ampixa Labs@__ampixa__·3d

@razaanstha Please follow. Next week there will be another natural tts based on styleTTS2

English

0

1

41

RaZaan@razaanstha·3d

@__ampixa__ Looks good 👀

English

1

0

1

68

Ampixa Labs@__ampixa__·3d

@Communist977 @aabhpsy IndicVoices has 23k hours of speech text pairs. Emilia is 46k hours Mls is 44k hours Nepali doesn't have that kind of speech -> text pairs Running ASR is also not viable with CER of around 12 to 18 % and hallucinations on open source ASRs like whisper for nepali

English

1

0

1

26

🔻☭★@Communist977·3d

@__ampixa__ @aabhpsy What do you mean scarcity of Nepali data?

English

1

0

26

Ampixa Labs@__ampixa__·3d

@kingofknowwhere sure, please keep looking. we plan to cover the whole 18 languages. Maithili ra nepali ko root sajilo bhayera yo sajilo huncha nai. G2P banauna parcha. If you know a linguist or prof who is fluent in maithali. please let us know

English

1

0

1

40

Ankit Jxa@kingofknowwhere·3d

@__ampixa__ Maithili bhee karo. Happy to help. :)

English

1

0

1

25

Ampixa Labs@__ampixa__·3d

@pranayaratnasha Well, can you defer some time for the extensive test? there is a better model dropping soon based on updated/evolved styleTTS2 architecture.

English

0

94

Pranaya Ratna Shakya@pranayaratnasha·3d

@__ampixa__ great love the initial demos seen here going to give it an extensive test to see how far it will reach. Great going on tihs. May be soon we will have a native speaking assitant in our phones rather than english speaking ones. Kudos looking forward to future updates.

English

1

0

2

101

Ampixa Labs@__ampixa__·3d

@kingofknowwhere thank you. Will drop a new model soon that is better than this in naturalness

English

1

0

1

75

Ankit Jxa@kingofknowwhere·3d

@__ampixa__ Good work

English

1

0

3

624

Ampixa Labs@__ampixa__·3d

@aabhpsy Because of scarcity of Nepali data you will get Hindi prosodies with Nepali speech. The best one so far to start working on is dots.tts by rednote social media team.

English

1

0

1

46

Aabhash Ghimire@aabhpsy·3d

Any already available options even on GPU? Can we continue work from what you have done so far and use GPUs to get production grade cloned audio in natural Nepali like we speak? I don't understand all but I can dig in and learn from 0 but since you guys have pioneered, it would be nice to get insights.

English

2

0

48

Ampixa Labs@__ampixa__·3d

Problem is gathering thousands of hours of tts data, diarize them, run background noise remover models... We first plan to at least have 5000 hour (silver + gold) Nepali speech + text pair db and we are 30% there. We will open source that too.. So maybe within this year we will have multishot voice cloner atleast... But the goal is naturalness and real time inference on cpu, with limited voice

English

1

0

1

181

Ampixa Labs@__ampixa__·3d

@sumfreelancer More projects like this in pipeline. Please follow

English

0

108

Suman Shrestha 🇳🇵@sumfreelancer·3d

@__ampixa__ Awesome

English

1

0

1

149

Ampixa Labs@__ampixa__·3d

@DemonXnomeD The g2p part is the new architecture. We are also working on upgraded styleTTS2 architecture for more naturalness.

English

1

0

165

DemonX@DemonXnomeD·3d

@__ampixa__ Any New architecture?? Or Just training any speech model?

English

1

0

1

178

Ampixa Labs@__ampixa__·3d

The full recipe: Data → G2P → Train → Ship pip install kala-tts Training recipe: github.com/Ampixa/nepal-t…

English

0

1

459

Ampixa Labs@__ampixa__·3d

How Kala reads Nepali text: नेपालमा → /ne.pal.ma/ रामले → /ram.le/ (not /raː.mə.leː/ like eSpeak) Akshara parse → schwa-deletion rules → IPA. No black-box character embeddings. Full frontend: github.com/Ampixa/nepa-ne…