VioP

2.9K posts

VioP banner
VioP

VioP

@AcousimHss

the more you laugh the more you cry the more you cry the more you laugh

Katılım Ocak 2025
966 Takip Edilen143 Takipçiler
VioP
VioP@AcousimHss·
@HessianFree Such an aura title insane Congrats!
Español
0
0
0
58
Omead Pooladzandi
Omead Pooladzandi@HessianFree·
your spotify cache is bigger than our largest AI model. Bonsai: 1-bit weights. 1.7B to 8B params. 14x compression vs bf16. 8x faster on edge. 256 MB to 1.2GB. Based on Qwen 3. we just came out of stealth. intelligence belongs at the edge and we're going to put it there. Apache 2.0. we compressed intelligence. more coming. @PrismML
Omead Pooladzandi tweet media
PrismML@PrismML

Today, we are emerging from stealth and launching PrismML, an AI lab with Caltech origins that is centered on building the most concentrated form of intelligence. At PrismML, we believe that the next major leaps in AI will be driven by order-of-magnitude improvements in intelligence density, not just sheer parameter count. Our first proof point is the 1-bit Bonsai 8B, a 1-bit weight model that fits into 1.15 GBs of memory and delivers over 10x the intelligence density of its full-precision counterparts. It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class. We are open-sourcing the model under Apache 2.0 license, along with Bonsai 4B and 1.7B models. When advanced models become small, fast, and efficient enough to run locally, the design space for AI changes immediately. We believe in a future of on-device agents, real-time robotics, offline intelligence and entirely new products that were previously impossible. We are excited to share our vision with you and keep working in the future to push the frontier of intelligence to the edge.

English
69
113
1.4K
107.8K
🥭
🥭@MangoSweet78·
On this day, I've managed to groom a guy into being an even more of a deranged horny mess than me.
English
2
0
7
203
VioP
VioP@AcousimHss·
I need to get priorities straight
English
0
0
0
15
VioP
VioP@AcousimHss·
these are better evals btw than the metrics we ourselves do these but our situation has more nuances because of code switching and multilinguality u have to think in a way what isn't common for the voice when given text example say take 2 abbreviations lol , lmao some people say lol as " L O L" and some people say lmao as just lmao other example take "bc" it can be used in 2 different ways bc as curse and bc as abbreviation for because but in both the cases u dont pronounce it literally "b c" these issues i have actually written a bit detailed in our blog in the data part! rumik.ai/research/silk
VioP tweet media
Trelis Research@TrelisResearch

Top Text-to-Speech (TTS) Models in 2026 -- There are a ton of text to speech models out there and it's hard to know what to choose. I created some tricky text samples for 10 different models (some proprietary, some open source) to synthesize. And then I compare them for accuracy and for realism. Proprietary models are ahead here for sure, although on realism, most models today are excellent. Timestamps: 0:00 Introduction to TTS evaluation with four-row dataset and three metrics 1:08 Tricky TTS dataset: symbols, abbreviations, nouns, and prosody challenges 2:13 Prosody examples: snoring sounds, hissing, and paralinguistic elements 4:17 Roundtrip CER methodology: TTS output transcribed back with ASR 6:12 Two evaluation metrics: roundtrip CER and mean opinion score (MOS) 7:12 Results: proprietary models (Gemini, GPT-4o, ElevenLabs) achieve 4.2-4.3 MOS 10:40 Gemini demo: handles symbols and prosody but produces unexpected Irish accent 13:39 GPT-4o mini paralinguistics test: snoring example and symbol errors 15:41 ElevenLabs struggles with technical content and Irish pronunciation 17:11 Kokoro performs well but mispronounces "WV" with incorrect pauses 19:28 Orpheus model tested on unfamiliar words, Irish, and technical citations 21:30 Piper TTS quality issues: airy and choppy delivery, CPU vs GPU tradeoffs 23:12 Voxtral autoregressive model stops early with premature end token 25:15 Chatterbox produces garbled output with high CER (0.86) but realistic sound 26:15 Recommendations: Kokoro best open-source option, normalization needed for technical text 27:04 Dataset and evaluation tools available on Trelis platform

English
1
0
4
119
VioP
VioP@AcousimHss·
for the 5 6 fans of tts area check this !
Trelis Research@TrelisResearch

Top Text-to-Speech (TTS) Models in 2026 -- There are a ton of text to speech models out there and it's hard to know what to choose. I created some tricky text samples for 10 different models (some proprietary, some open source) to synthesize. And then I compare them for accuracy and for realism. Proprietary models are ahead here for sure, although on realism, most models today are excellent. Timestamps: 0:00 Introduction to TTS evaluation with four-row dataset and three metrics 1:08 Tricky TTS dataset: symbols, abbreviations, nouns, and prosody challenges 2:13 Prosody examples: snoring sounds, hissing, and paralinguistic elements 4:17 Roundtrip CER methodology: TTS output transcribed back with ASR 6:12 Two evaluation metrics: roundtrip CER and mean opinion score (MOS) 7:12 Results: proprietary models (Gemini, GPT-4o, ElevenLabs) achieve 4.2-4.3 MOS 10:40 Gemini demo: handles symbols and prosody but produces unexpected Irish accent 13:39 GPT-4o mini paralinguistics test: snoring example and symbol errors 15:41 ElevenLabs struggles with technical content and Irish pronunciation 17:11 Kokoro performs well but mispronounces "WV" with incorrect pauses 19:28 Orpheus model tested on unfamiliar words, Irish, and technical citations 21:30 Piper TTS quality issues: airy and choppy delivery, CPU vs GPU tradeoffs 23:12 Voxtral autoregressive model stops early with premature end token 25:15 Chatterbox produces garbled output with high CER (0.86) but realistic sound 26:15 Recommendations: Kokoro best open-source option, normalization needed for technical text 27:04 Dataset and evaluation tools available on Trelis platform

English
0
0
2
75
VioP
VioP@AcousimHss·
@TrelisResearch absolutely love ur content!! u should have tried echo tts , irodori tts v2 too!! prolly the best but not popular
English
0
0
2
95
Trelis Research
Trelis Research@TrelisResearch·
Top Text-to-Speech (TTS) Models in 2026 -- There are a ton of text to speech models out there and it's hard to know what to choose. I created some tricky text samples for 10 different models (some proprietary, some open source) to synthesize. And then I compare them for accuracy and for realism. Proprietary models are ahead here for sure, although on realism, most models today are excellent. Timestamps: 0:00 Introduction to TTS evaluation with four-row dataset and three metrics 1:08 Tricky TTS dataset: symbols, abbreviations, nouns, and prosody challenges 2:13 Prosody examples: snoring sounds, hissing, and paralinguistic elements 4:17 Roundtrip CER methodology: TTS output transcribed back with ASR 6:12 Two evaluation metrics: roundtrip CER and mean opinion score (MOS) 7:12 Results: proprietary models (Gemini, GPT-4o, ElevenLabs) achieve 4.2-4.3 MOS 10:40 Gemini demo: handles symbols and prosody but produces unexpected Irish accent 13:39 GPT-4o mini paralinguistics test: snoring example and symbol errors 15:41 ElevenLabs struggles with technical content and Irish pronunciation 17:11 Kokoro performs well but mispronounces "WV" with incorrect pauses 19:28 Orpheus model tested on unfamiliar words, Irish, and technical citations 21:30 Piper TTS quality issues: airy and choppy delivery, CPU vs GPU tradeoffs 23:12 Voxtral autoregressive model stops early with premature end token 25:15 Chatterbox produces garbled output with high CER (0.86) but realistic sound 26:15 Recommendations: Kokoro best open-source option, normalization needed for technical text 27:04 Dataset and evaluation tools available on Trelis platform
Trelis Research tweet media
English
3
1
31
1.7K
VioP
VioP@AcousimHss·
i need to make my own cuda images from nwo on fun and pain at same time ngl
English
0
0
1
24
VioP
VioP@AcousimHss·
tonight we scrape the shit out of torchtitan and pytorch optimizations
English
1
0
1
31
VioP
VioP@AcousimHss·
@257gon_ We should q Girl failures 4ever
English
0
0
1
311
Gowthami
Gowthami@gowthami_s·
Excited to share I’ve joined @theworldlabs! Generating pixels and frames was just the prologue. Now it's time to build frontier models that actually understand physics and power living, breathing simulations. Onwards to new worlds. 🌎🚀
Gowthami tweet media
English
39
6
306
18.5K
VioP
VioP@AcousimHss·
@m_sirovatka @robertshaw21 U mind sharing ur progress I think ur learning path can help a lot of people because u have mentioned once u didn't have any practical experience too and now u say u had a rough start but now u do one of the hardest shi So it would be very nice to know : )
English
1
0
0
276
VioP
VioP@AcousimHss·
Can someone end this post training era I don't like this phase
English
0
0
2
50