KugelAudio (P26)

10 posts

KugelAudio (P26) banner
KugelAudio (P26)

KugelAudio (P26)

@KugelAudio

Kugel is an enterprise-ready TTS model, available on-prem, with a focus on 25+ languages and low latency.

Berlin Se unió Şubat 2026
6 Siguiendo63 Seguidores
Y Combinator
Y Combinator@ycombinator·
.@KugelAudio builds multilingual voice AI you can run in your own Kubernetes cluster. It handles 30+ languages and dialects naturally. Even phone numbers, emails, and mixed-language text — fully on-prem. ycombinator.com/launches/QXA-k…
English
11
6
81
11.8K
KugelAudio (P26)
KugelAudio (P26)@KugelAudio·
KugelAudio is launching today! Excited to share our first frontier voice model with the world: real‑time multilingual TTS. Super low latency and over 30 languages. And you can even clone your voice. Try it yourself 👇
English
6
10
78
5.9K
Y Combinator
Y Combinator@ycombinator·
Hyper (@heyhyperai) is building the self-driving brain for companies. Hyper's agents synthesize millions of emails, docs, Slacks, and make everyone's AI tools instantly smarter without anyone doing anything. Congrats on the launch, @kanyesthaker and @_shalinshah_! heyhyper.ai
English
40
17
240
1.7M
Inth (YC P26)
Inth (YC P26)@Inth·
Launch Week Day 1. Today, Consent.io becomes inth.com. We've evolved from just a cookie consent banner. We're Inth, consent and privacy infrastructure. Built for performance, full developer control. We're not doing this alone. We've raised a $1.2m pre-seed and joined @ycombinator's YC P26. Over the last year, @c15tdev has reached 1.7k GitHub stars and 1.6M+ downloads. This week, we're announcing the first steps of what Inth is building.
English
31
13
103
30.1K
Linoy Tsaban
Linoy Tsaban@linoy_tsaban·
ICYMI: KugelAudio is an open source TTS model that should get way more attention > fine-tuned from Vibe-Voice 7B > trained on 200K hours of 23 Languages > state-of-the-art performance 🔥
English
5
6
81
4.2K
Pasha S
Pasha S@psk90_ai·
We finally have a 7B parameter Transformer for Text-to-Speech. 📉 **KugelAudio-0-Open** just dropped, and the architecture is fascinating. Most modern TTS systems (like F5-TTS) are purely diffusion-based. KugelAudio takes a "Hybrid" approach that leans heavily on LLM reasoning. **The Engineering Stack:** 1. **The Brain (Qwen2.5-7B):** 🧠 Instead of a tiny text encoder, it uses a full 7B LLM (Qwen2.5) to process the input. *Why it matters:* It understands that "The wind needs to *wind* down" uses two different pronunciations of "wind" based on semantic context. 2. **The Voice (VibeVoice Base):** It builds on Microsoft's VibeVoice architecture (AR + Diffusion). It predicts semantic latents first (what to say), then uses diffusion to refine the acoustic details (how to say it). 3. **Voice Cloning (Zero-Shot):** You can feed it a 10-second reference clip (e.g., "Angry Captain"), and because of the Semantic Encoder, it clones not just the timbre, but the *vibe*. 4. **The Cost:** It needs **~19GB VRAM** (FP16). This is strictly for the RTX 3090/4090 crowd or A100 server deployments. It is not a "run on your laptop" model (yet). **Benchmarks:** It claims state-of-the-art performance on European languages (German, French, Spanish, Polish), specifically outperforming commercial APIs in blind preference tests. **GitHub:** lnkd.in/gM5-TPUf **Weights:** lnkd.in/gtRPDVNR --- 🚀 **Need Custom Training?** We specialize in adapting these massive models for enterprise deployment. If you need **Custom Fine-Tuning for Voice Models** (cloning, accents, or low-latency optimization), **DM me**. 📩 ♻️ **Repost** if you have the VRAM to run this! ➕ **Follow me Pasha S** for more Engineering Deep Dives.
Pasha S tweet media
English
1
0
4
123