asitis

1.6K posts

asitis banner
asitis

asitis

@rifeash

Updating the world view one posterior at a time. For anonymous feedback: https://t.co/VcW2ZrZALN

Oblivious RAM Katılım Aralık 2021
631 Takip Edilen250 Takipçiler
asitis
asitis@rifeash·
@aabhpsy pretty soon dai. But first OCR... thanks for the product idea
English
0
0
1
13
Psudo
Psudo@cold_daal·
@__ampixa__ Congrats on the launch. I remember testing out the tts voice ranking site you guys posted on reddit.
English
1
0
1
20
Ampixa Labs
Ampixa Labs@__ampixa__·
🇳🇵 kala-tts : नेपालमै बनेको, पहिलो आफ्नै देवनागरी G2P सहितको खुला-स्रोत नेपाली VITS आवाज। cloud छैन · तपाईंकै CPU मा चल्छ। pip install kala-tts · 🎧 tts.ampixa.com/kala
NE
10
21
125
34K
asitis
asitis@rifeash·
@shreemaan_abhi @__ampixa__ Thanks... more interesting things coming. Its been long time working on stealth . aba haluka significant kura yesari nai share gardai janchu hola
English
0
0
0
18
Shreemaan Abhishek
Shreemaan Abhishek@shreemaan_abhi·
@__ampixa__ Great work dai and team. Now I can connect the dots backwards when you shared that tts ko graph draw gareko whiteboard diagram several months ago. Rooting for you!
English
1
0
2
68
Aabhash Ghimire
Aabhash Ghimire@aabhpsy·
Any already available options even on GPU? Can we continue work from what you have done so far and use GPUs to get production grade cloned audio in natural Nepali like we speak? I don't understand all but I can dig in and learn from 0 but since you guys have pioneered, it would be nice to get insights.
English
2
0
0
48
asitis
asitis@rifeash·
@bbk_dkl Do you mean the full conversational voice agent? For that we require speech language models ... We might be able to join ASR + language model + TTS and in text to speech part we should be able to use it but i wouldn't think that would be the good usecase of this model here.
English
1
0
1
47
asitis
asitis@rifeash·
any pattern of numbers can be described as a sum of cosine waves .
English
0
0
3
117
asitis
asitis@rifeash·
@bijaysenihang I understand you no longer access to fable as a Nepali citizen 😆
English
1
0
21
4.3K
Bijay Limbu Senihang 🛡️
I am done staying in Nepal and creating hope, only for the pathetic Nepal government to break you from every side. Going forward, I will no longer provide my knowledge, time, or cybersecurity expertise to the Government of Nepal.
English
52
30
298
38.8K
asitis
asitis@rifeash·
The world i grew up in no longer exists.
asitis tweet media
English
0
0
3
110
asitis
asitis@rifeash·
Design: model translates Limbu to Nepali, then in a completely fresh context translates its own Nepali back into Limbu. Scored vs the original human Limbu (chrF). An "echo" metric catches models that fake it by copying the input through both legs which disqualified our chinese brethrens
English
0
0
1
101
asitis
asitis@rifeash·
Can frontier LLMs actually translate Limbu which is Kiranti language of eastern Nepal I round-trip-tested them on 100 human-reviewed phrases from Nepali school-curriculum materials What i used: Grade-1 math textbook, translated into Limbu by humans.
asitis tweet media
English
1
0
1
146
asitis
asitis@rifeash·
if this post gets 5 likes. i will open source it.
Ampixa Labs@__ampixa__

कि कसो @RabindraMishra ज्यू context: यो नेपालको लागी नेपालमा बनेको Text To Speech प्रणाली बाट बनाइएको हो ।

English
1
0
6
115
asitis retweetledi
Tulip King 🌷
Tulip King 🌷@tulipking·
i look forward to our chinese brothers liberating the knowledge from within fable-5 and selling it to me at 5% the cost & 2x the speed
English
316
1.6K
24.6K
1.1M
asitis
asitis@rifeash·
She said "I'm fine", but my speech language model didn't understand her. Because it doesn't catch tones, emotion and stress. Here is how to solve it if you take ASR like whisper and it's 16th decoder layer(P_16). Then create a reconstructor model it is trained on three passes first pass: audio, text -> whisper -> P_16 + text second pass: p_16 + text--> reconstruct mel spectogram third pass: compare with original mel spectogram Train until reconstruction is perfect. Now you can replace that p_16 on whisper_v3 and get a new model. Call it whisper pro Now use the whisper pro +(texts, audio, emotion metadata) from emotion set like (IEMOCAP, CREMA-D) to create a SLM(Speech language model) input = [P₁₆ prosody vector] + [text token embeddings] Congrats you got a better SLM arxiv.org/abs/2605.05927
asitis tweet media
English
1
0
1
71