VioP (@AcousimHss) - Twitter Profili | Zamantika Mersobahis Locabet

VioP@AcousimHss·24m

Double the limits @thsottiaux regarding this auspicious event Les go

Today, we closed our latest funding round with $122 billion in committed capital at an $852B post-money valuation. The fastest way to expand AI’s benefits is to put useful intelligence in people’s hands early and let access compound globally. This funding gives us resources to lead at scale. openai.com/index/accelera…

English

0

6

VioP@AcousimHss·1h

@rosewaterjelly You really love them don't you

English

0

57

:p@rosewaterjelly·4h

apollo and graves followup

:p@rosewaterjelly

apollo and graves get burgers

English

25

448

2.9K

28.5K

VioP@AcousimHss·6h

@sulka_art SILVERRRRRR

Português

0

1

28

sulka@sulka_art·7h

what other deadlock hero should i draw? :)

sulka@sulka_art

#deadlock 🦄

English

9

3

143

3K

VioP@AcousimHss·7h

@HessianFree Such an aura title insane Congrats!

Español

0

58

Omead Pooladzandi@HessianFree·9h

your spotify cache is bigger than our largest AI model. Bonsai: 1-bit weights. 1.7B to 8B params. 14x compression vs bf16. 8x faster on edge. 256 MB to 1.2GB. Based on Qwen 3. we just came out of stealth. intelligence belongs at the edge and we're going to put it there. Apache 2.0. we compressed intelligence. more coming. @PrismML

PrismML@PrismML

Today, we are emerging from stealth and launching PrismML, an AI lab with Caltech origins that is centered on building the most concentrated form of intelligence. At PrismML, we believe that the next major leaps in AI will be driven by order-of-magnitude improvements in intelligence density, not just sheer parameter count. Our first proof point is the 1-bit Bonsai 8B, a 1-bit weight model that fits into 1.15 GBs of memory and delivers over 10x the intelligence density of its full-precision counterparts. It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class. We are open-sourcing the model under Apache 2.0 license, along with Bonsai 4B and 1.7B models. When advanced models become small, fast, and efficient enough to run locally, the design space for AI changes immediately. We believe in a future of on-device agents, real-time robotics, offline intelligence and entirely new products that were previously impossible. We are excited to share our vision with you and keep working in the future to push the frontier of intelligence to the edge.

English

69

113

1.4K

107.8K

VioP@AcousimHss·7h

Pretty insane claims ngl Moar small models pls

PrismML@PrismML

This scatter plot shows the Pareto frontier of intelligence vs. size, defined by models like Qwen3 0.6B, 1.7B, 4B, 8B, and Ministral3 3B. The 1-bit Bonsai family shifts that frontier dramatically to the left. This changes the tradeoff itself: models no longer have to be large to be capable.

English

0

2

67

VioP@AcousimHss·8h

@MangoSweet78 Achievements

English

0

1

31

🥭@MangoSweet78·9h

On this day, I've managed to groom a guy into being an even more of a deranged horny mess than me.

English

2

0

7

203

VioP@AcousimHss·10h

I need to get priorities straight

English

0

15

VioP@AcousimHss·11h

these are better evals btw than the metrics we ourselves do these but our situation has more nuances because of code switching and multilinguality u have to think in a way what isn't common for the voice when given text example say take 2 abbreviations lol , lmao some people say lol as " L O L" and some people say lmao as just lmao other example take "bc" it can be used in 2 different ways bc as curse and bc as abbreviation for because but in both the cases u dont pronounce it literally "b c" these issues i have actually written a bit detailed in our blog in the data part! rumik.ai/research/silk

Trelis Research@TrelisResearch

Top Text-to-Speech (TTS) Models in 2026 -- There are a ton of text to speech models out there and it's hard to know what to choose. I created some tricky text samples for 10 different models (some proprietary, some open source) to synthesize. And then I compare them for accuracy and for realism. Proprietary models are ahead here for sure, although on realism, most models today are excellent. Timestamps: 0:00 Introduction to TTS evaluation with four-row dataset and three metrics 1:08 Tricky TTS dataset: symbols, abbreviations, nouns, and prosody challenges 2:13 Prosody examples: snoring sounds, hissing, and paralinguistic elements 4:17 Roundtrip CER methodology: TTS output transcribed back with ASR 6:12 Two evaluation metrics: roundtrip CER and mean opinion score (MOS) 7:12 Results: proprietary models (Gemini, GPT-4o, ElevenLabs) achieve 4.2-4.3 MOS 10:40 Gemini demo: handles symbols and prosody but produces unexpected Irish accent 13:39 GPT-4o mini paralinguistics test: snoring example and symbol errors 15:41 ElevenLabs struggles with technical content and Irish pronunciation 17:11 Kokoro performs well but mispronounces "WV" with incorrect pauses 19:28 Orpheus model tested on unfamiliar words, Irish, and technical citations 21:30 Piper TTS quality issues: airy and choppy delivery, CPU vs GPU tradeoffs 23:12 Voxtral autoregressive model stops early with premature end token 25:15 Chatterbox produces garbled output with high CER (0.86) but realistic sound 26:15 Recommendations: Kokoro best open-source option, normalization needed for technical text 27:04 Dataset and evaluation tools available on Trelis platform

English

1

0

4

119

VioP@AcousimHss·11h

for the 5 6 fans of tts area check this !

Trelis Research@TrelisResearch

Top Text-to-Speech (TTS) Models in 2026 -- There are a ton of text to speech models out there and it's hard to know what to choose. I created some tricky text samples for 10 different models (some proprietary, some open source) to synthesize. And then I compare them for accuracy and for realism. Proprietary models are ahead here for sure, although on realism, most models today are excellent. Timestamps: 0:00 Introduction to TTS evaluation with four-row dataset and three metrics 1:08 Tricky TTS dataset: symbols, abbreviations, nouns, and prosody challenges 2:13 Prosody examples: snoring sounds, hissing, and paralinguistic elements 4:17 Roundtrip CER methodology: TTS output transcribed back with ASR 6:12 Two evaluation metrics: roundtrip CER and mean opinion score (MOS) 7:12 Results: proprietary models (Gemini, GPT-4o, ElevenLabs) achieve 4.2-4.3 MOS 10:40 Gemini demo: handles symbols and prosody but produces unexpected Irish accent 13:39 GPT-4o mini paralinguistics test: snoring example and symbol errors 15:41 ElevenLabs struggles with technical content and Irish pronunciation 17:11 Kokoro performs well but mispronounces "WV" with incorrect pauses 19:28 Orpheus model tested on unfamiliar words, Irish, and technical citations 21:30 Piper TTS quality issues: airy and choppy delivery, CPU vs GPU tradeoffs 23:12 Voxtral autoregressive model stops early with premature end token 25:15 Chatterbox produces garbled output with high CER (0.86) but realistic sound 26:15 Recommendations: Kokoro best open-source option, normalization needed for technical text 27:04 Dataset and evaluation tools available on Trelis platform

English

0

2

75

VioP@AcousimHss·11h

@TrelisResearch absolutely love ur content!! u should have tried echo tts , irodori tts v2 too!! prolly the best but not popular

English

0

2

95

Trelis Research@TrelisResearch·12h

Top Text-to-Speech (TTS) Models in 2026 -- There are a ton of text to speech models out there and it's hard to know what to choose. I created some tricky text samples for 10 different models (some proprietary, some open source) to synthesize. And then I compare them for accuracy and for realism. Proprietary models are ahead here for sure, although on realism, most models today are excellent. Timestamps: 0:00 Introduction to TTS evaluation with four-row dataset and three metrics 1:08 Tricky TTS dataset: symbols, abbreviations, nouns, and prosody challenges 2:13 Prosody examples: snoring sounds, hissing, and paralinguistic elements 4:17 Roundtrip CER methodology: TTS output transcribed back with ASR 6:12 Two evaluation metrics: roundtrip CER and mean opinion score (MOS) 7:12 Results: proprietary models (Gemini, GPT-4o, ElevenLabs) achieve 4.2-4.3 MOS 10:40 Gemini demo: handles symbols and prosody but produces unexpected Irish accent 13:39 GPT-4o mini paralinguistics test: snoring example and symbol errors 15:41 ElevenLabs struggles with technical content and Irish pronunciation 17:11 Kokoro performs well but mispronounces "WV" with incorrect pauses 19:28 Orpheus model tested on unfamiliar words, Irish, and technical citations 21:30 Piper TTS quality issues: airy and choppy delivery, CPU vs GPU tradeoffs 23:12 Voxtral autoregressive model stops early with premature end token 25:15 Chatterbox produces garbled output with high CER (0.86) but realistic sound 26:15 Recommendations: Kokoro best open-source option, normalization needed for technical text 27:04 Dataset and evaluation tools available on Trelis platform

English

3

1

31

1.7K

VioP@AcousimHss·12h

i need to make my own cuda images from nwo on fun and pain at same time ngl

English

0

1

24

VioP@AcousimHss·13h

tonight we scrape the shit out of torchtitan and pytorch optimizations

English

1

0

1

31

VioP@AcousimHss·1d

@Scarymoans Delete this

English

0

1

110

Vindicta@Scarymoans·1d

When you silence and disarm Silver during her ult

Your Typical Local Man@LocalBateman

English

4

19

824

11.6K

VioP@AcousimHss·1d

@257gon_ We should q Girl failures 4ever

English

0

1

311

Fortune 🎸🌭@257gon_·1d

i actually laughed

English

8

4

564

15.3K

VioP@AcousimHss·1d

If cfg was producing artifacts why did no one ever report it? Was there any fault here? I mean if all previous models which adapted cfg was producing artifacts and we still trained with it We don't even know how the models would be if they were trained with apg This is so bizzare

Meituan LongCat@Meituan_LongCat

🚀 LongCat-AudioDiT is here — SOTA diffusion TTS built directly in waveform latent space. Key Highlights: ✦ Non-autoregressive diffusion-based TTS that directly operates in waveform latent space to reduce compounding errors. ✦ Simple but powerful Wav-VAE + Diffusion pipeline that achieves SOTA voice cloning performance (0.818/0.797 SIM score on Seed-ZH/Seed-Hard) and competitive intelligibility among all models (open and closed-source). ✦ Available in 1B / 3.5B, supporting high-fidelity, multilingual (ZH/EN) audio generation. ✨ Technical Innovations: ✦ By introducing the APG algorithm to replace CFG, AudioDiT improves the perceived naturalness and acoustic quality of synthesized audio. ✦ Resolves the long-standing and previously unsolved training-inference mismatch in diffusion-based TTS models. ✦ Systematic experiments reveal a non-intuitive relationship between latent space reconstruction quality and overall TTS performance: better VAE ≠ better TTS. 📄 Tech Report: github.com/meituan-longca… 🐙 GitHub: github.com/meituan-longca… 🤗 Hugging Face: huggingface.co/meituan-longca…

English

0

58

VioP@AcousimHss·1d

@feiyuu__ U da real art

English

0

265

Vincent | COMMS open@feiyuu__·1d

the artist vs the art ❤️‍🩹

velv (comms closed 6/6, waitlist open)@velviannee

The artist vs the art

English

19

87

3.4K

42K

VioP@AcousimHss·1d

@gowthami_s @theworldlabs Holy congrats

English

1

0

142

Gowthami@gowthami_s·1d

Excited to share I’ve joined @theworldlabs! Generating pixels and frames was just the prologue. Now it's time to build frontier models that actually understand physics and power living, breathing simulations. Onwards to new worlds. 🌎🚀

English

39

6

306

18.5K

VioP@AcousimHss·1d

Reason why the perfect vae say like dacvae, it's not recommend to use it for llm tts It's just hard for the model to learn it Same Old Dit TTS but nice change with APG

Meituan LongCat@Meituan_LongCat

🚀 LongCat-AudioDiT is here — SOTA diffusion TTS built directly in waveform latent space. Key Highlights: ✦ Non-autoregressive diffusion-based TTS that directly operates in waveform latent space to reduce compounding errors. ✦ Simple but powerful Wav-VAE + Diffusion pipeline that achieves SOTA voice cloning performance (0.818/0.797 SIM score on Seed-ZH/Seed-Hard) and competitive intelligibility among all models (open and closed-source). ✦ Available in 1B / 3.5B, supporting high-fidelity, multilingual (ZH/EN) audio generation. ✨ Technical Innovations: ✦ By introducing the APG algorithm to replace CFG, AudioDiT improves the perceived naturalness and acoustic quality of synthesized audio. ✦ Resolves the long-standing and previously unsolved training-inference mismatch in diffusion-based TTS models. ✦ Systematic experiments reveal a non-intuitive relationship between latent space reconstruction quality and overall TTS performance: better VAE ≠ better TTS. 📄 Tech Report: github.com/meituan-longca… 🐙 GitHub: github.com/meituan-longca… 🤗 Hugging Face: huggingface.co/meituan-longca…

English

0

2

86

VioP@AcousimHss·1d

@m_sirovatka @robertshaw21 U mind sharing ur progress I think ur learning path can help a lot of people because u have mentioned once u didn't have any practical experience too and now u say u had a rough start but now u do one of the hardest shi So it would be very nice to know : )

English

1

0

276

Matej Sirovatka@m_sirovatka·1d

I became an inference engineer for a month (a very bad one), more to come, really appreciate the help the whole VLLM team (mainly @robertshaw21 ) were giving in answering my dumb questions 🫡

samsja@samsja19

Today we’re releasing prime-rl v0.5.0. This is a major release, with 200+ commits from 22 contributors since v0.4.0. on the menu: * PD-DisAgg inference to boost agentic RL training * support for GLM-5, Qwen3.5, and Nemotron * a complete revamp of environment execution for better performance * first-class multi nodes slurm support directly from the config * quack kernel, selective AC, and more we also added several new guides to the docs, including large MoE agentic training guides. and that’s alongside many more bug fixes and improvements

English

7

3

175

14.8K

VioP@AcousimHss·1d

Can someone end this post training era I don't like this phase

English

0

2

50

VioP

Keşfet