KugelAudio (P26)

10 posts

KugelAudio (P26)

@KugelAudio

Kugel is an enterprise-ready TTS model, available on-prem, with a focus on 25+ languages and low latency.

Berlin انضم Şubat 2026

6 يتبع63 المتابعون

KugelAudio (P26)@KugelAudio·5d

@flowgaiaio @ycombinator You can absolutely use it for this purpose.

English

FlowGaia@flowgaiaio·6d

@ycombinator @KugelAudio What about outbound calls?

English

Y Combinator@ycombinator·28 May

.@KugelAudio builds multilingual voice AI you can run in your own Kubernetes cluster. It handles 30+ languages and dialects naturally. Even phone numbers, emails, and mixed-language text — fully on-prem. ycombinator.com/launches/QXA-k…

English

11.8K

KugelAudio (P26)@KugelAudio·29 May

kugelaudio.com

ZXX

480

KugelAudio (P26)@KugelAudio·29 May

KugelAudio is launching today! Excited to share our first frontier voice model with the world: real‑time multilingual TTS. Super low latency and over 30 languages. And you can even clone your voice. Try it yourself 👇

English

5.9K

KugelAudio (P26)@KugelAudio·29 May

@AsadAliKhan1981 @ycombinator thanks Asad 🔮 kugel in all caves!

English

Asad@AsadAliKhan1981·28 May

@ycombinator @KugelAudio Love the ad 😜

English

KugelAudio (P26)@KugelAudio·14 May

@ycombinator @heyhyperai @KanyesThaker @_shalinshah_ congrats

English

Y Combinator@ycombinator·11 May

Hyper (@heyhyperai) is building the self-driving brain for companies. Hyper's agents synthesize millions of emails, docs, Slacks, and make everyone's AI tools instantly smarter without anyone doing anything. Congrats on the launch, @kanyesthaker and @_shalinshah_! heyhyper.ai

English

240

1.7M

KugelAudio (P26)@KugelAudio·14 May

@ycombinator @InstaAgentAI @klwongkyle @tseungcolin Congrats on the launch

English

Y Combinator@ycombinator·12 May

InstaAgent (@InstaAgentAI) helps B2C companies scale social media marketing across hundreds of personas. They’ve already reached $1M ARR in just 10 months. Congrats on the launch, @klwongkyle & @tseungcolin! ycombinator.com/launches/QKB-i…

English

451

46K

KugelAudio (P26)@KugelAudio·11 May

@ycombinator @oddpool_alerts @c0delemons @RiteshMalpani Congrats!

English

Y Combinator@ycombinator·11 May

Oddpool (@oddpool_alerts) is the institutional data layer for prediction markets. It captures every trade and order book update across every platform and normalizes them under universal tickers. Congrats on the launch, @c0delemons & @RiteshMalpani! ycombinator.com/launches/QKF-o…

English

208

28.8K

KugelAudio (P26)@KugelAudio·13 Nis

@Inth Happy launch!

English

141

Inth (YC P26)@Inth·13 Nis

Launch Week Day 1. Today, Consent.io becomes inth.com. We've evolved from just a cookie consent banner. We're Inth, consent and privacy infrastructure. Built for performance, full developer control. We're not doing this alone. We've raised a $1.2m pre-seed and joined @ycombinator's YC P26. Over the last year, @c15tdev has reached 1.7k GitHub stars and 1.6M+ downloads. This week, we're announcing the first steps of what Inth is building.

English

103

30.1K

KugelAudio (P26)@KugelAudio·12 Şub

@linoy_tsaban Thanks!

English

Linoy Tsaban@linoy_tsaban·9 Şub

ICYMI: KugelAudio is an open source TTS model that should get way more attention > fine-tuned from Vibe-Voice 7B > trained on 200K hours of 23 Languages > state-of-the-art performance 🔥

English

4.2K

KugelAudio (P26)@KugelAudio·6 Şub

@psk90_ai We are trying everything to become the best voice modell!

English

Pasha S@psk90_ai·4 Şub

We finally have a 7B parameter Transformer for Text-to-Speech. 📉 **KugelAudio-0-Open** just dropped, and the architecture is fascinating. Most modern TTS systems (like F5-TTS) are purely diffusion-based. KugelAudio takes a "Hybrid" approach that leans heavily on LLM reasoning. **The Engineering Stack:** 1. **The Brain (Qwen2.5-7B):** 🧠 Instead of a tiny text encoder, it uses a full 7B LLM (Qwen2.5) to process the input. *Why it matters:* It understands that "The wind needs to *wind* down" uses two different pronunciations of "wind" based on semantic context. 2. **The Voice (VibeVoice Base):** It builds on Microsoft's VibeVoice architecture (AR + Diffusion). It predicts semantic latents first (what to say), then uses diffusion to refine the acoustic details (how to say it). 3. **Voice Cloning (Zero-Shot):** You can feed it a 10-second reference clip (e.g., "Angry Captain"), and because of the Semantic Encoder, it clones not just the timbre, but the *vibe*. 4. **The Cost:** It needs **~19GB VRAM** (FP16). This is strictly for the RTX 3090/4090 crowd or A100 server deployments. It is not a "run on your laptop" model (yet). **Benchmarks:** It claims state-of-the-art performance on European languages (German, French, Spanish, Polish), specifically outperforming commercial APIs in blind preference tests. **GitHub:** lnkd.in/gM5-TPUf **Weights:** lnkd.in/gtRPDVNR --- 🚀 **Need Custom Training?** We specialize in adapting these massive models for enterprise deployment. If you need **Custom Fine-Tuning for Voice Models** (cloning, accents, or low-latency optimization), **DM me**. 📩 ♻️ **Repost** if you have the VRAM to run this! ➕ **Follow me Pasha S** for more Engineering Deep Dives.

English

123

اكتشف

@flowgaiaio @ycombinator @AsadAliKhan1981 @heyhyperai @KanyesThaker @_shalinshah_ @kanyesthaker @InstaAgentAI