Inworld AI

489 posts

Inworld AI banner
Inworld AI

Inworld AI

@inworld_ai

Realtime Voice AI: APIs for STT, LLM Router, TTS, and the full conversational pipeline. We build the engine. You ship what matters.

Mountain View, CA Beigetreten Ağustos 2021
203 Folgt10.5K Follower
Angehefteter Tweet
Inworld AI
Inworld AI@inworld_ai·
Introducing Realtime TTS-2, a new generation of voice model built for realtime conversation. It is the first voice model that hears the conversation, takes natural-language voice direction, holds one voice identity across over 100 languages, and speaks like a person who is paying attention. The result is voice AI that feels as good as it sounds. Try it out: tinyurl.com/RealtimeAI Learn More: tinyurl.com/TTS-2Blog
English
106
163
782
321.6K
Inworld AI
Inworld AI@inworld_ai·
90% of bad voice agent experiences come down to three fixable problems. Here's what our team tells every developer before they rip anything out.
English
1
1
10
461
Inworld AI
Inworld AI@inworld_ai·
Inworld Realtime TTS-2 just hit the @ArtificialAnlys leaderboard, claiming its spot as the top realtime TTS model in the world, and it's still in research preview. 6x faster to first audio than the next closest model. Voice direction. Text-based voice design. 100+ languages. Conversational awareness. And we're just getting started. Link in bio.
English
2
6
30
1.5K
Inworld AI
Inworld AI@inworld_ai·
TTS-2 works out of the box with 25+ platforms and voice agent frameworks. You don't need complex custom integration, just plug it into whatever you're building and ship, ship, ship. If yours isn't on the list and you want to build one, contact our partnerships team here: bit.ly/4wZSAka
English
5
2
35
3.7K
Inworld AI
Inworld AI@inworld_ai·
You don't talk to your boss the way you talk to your best friend. You don't read a bedtime story the way you give a presentation. You just naturally shift your tone, your pace, your energy without even thinking about it. Voice AI should work like that, but with most TTS, if you want a different mood or tone you need to swap to a whole different voice, which breaks experiences. With TTS-2, you direct the voice the same way you'd direct an actor, in natural language. You just write something like [say warmly] or [speak with urgency through gritted teeth] before your text and the voice actually performs it. Pitch, pacing, volume, emotion, all from a plain english instructions with no parameter tuning or voice swapping. Try it out: bit.ly/4f0G9xY
English
13
5
46
2.7K
Inworld AI retweetet
Lore_Machine
Lore_Machine@lore_machine·
Fun story: Lore Machine's Korean-speaking user base exploded overnight. We didn't have the time to regionalize the entire app. With the help of @inworld_ai's Realtime TTS-2 model, tens of thousands of Korean users can now play their LOREs with Korean voice-over!
English
2
1
8
980
Inworld AI
Inworld AI@inworld_ai·
Congrats to @Thobey_Campion and the @lore_machine team on blowing up in South Korea! Lore Machine is a US-based interactive storytelling platform that started seeing massive organic traction in South Korea. They needed to move fast to capture momentum but had to do so without having to spin up a full localization effort. By integrating Inworld's Realtime TTS-2, they brought Korean voice-over support to tens of thousands of users almost immediately. Can't wait to see where this goes next!
English
2
2
16
805
Inworld AI
Inworld AI@inworld_ai·
@faionur @AbstractVC @generalcatalyst @usv Congrats Fai and the Wishroll team! Watching you go from launch to 1M users to redefining what AI entertainment looks like has been incredible. Proud to power the AI models and infra behind Status. This is just the beginning!
English
0
0
3
223
Inworld AI retweetet
fai nur
fai nur@faionur·
Status has raised $17M in seed and Series A funding led by @AbstractVC, @generalcatalyst and @usv to let anyone step inside their favorite stories, become famous, and live a million different lives. We quietly launched Status last year and grew to over 1 million users in 19 days - making us the fastest growing AI app since ChatGPT. But we hit a (predictable) snag - the app was incredibly expensive to run. How do you serve millions of users without degrading the product with a cheap LLM? So the team locked in: we rebuilt the whole experience, and our technical bets paid off. Our users now spend 35 minutes on average to (90 minutes each day for power users!), and millions of characters and worlds have been created. All by our users. The next frontier of entertainment is mobile-first and deeply personal. Traditional mobile games take years to build and rarely stick. TV shows are fleeting in the age of streaming. Status is different. It is not a game you finish, it is a world you can live in. Status is a new category entirely: Immersive Social Entertainment, and we believe strongly that it is the next great entertainment paradigm. We’ve 10x’d to millions in annual revenue in Q1 2026, we're just getting started. On Status, you can be anyone.
English
178
67
765
247.8K
Inworld AI
Inworld AI@inworld_ai·
"We built four different metrics for 'conversational' before realizing nobody had agreed on what that word meant." That's the whole problem with TTS evaluation right now. There's no single score that tells you if a voice model is good. It depends on the use case, the language, the latency requirements, the domain. @altsoph our Head of Evaluations wrote up why the industry is grading these models wrong, and why "best for what exactly?" is the only honest question worth asking. Learn more: bit.ly/4dQv0P1
Inworld AI tweet media
English
5
6
31
1.7K
Inworld AI retweetet
Vision Agents
Vision Agents@visionagents_ai·
Don’t start the weekend crashing out alone We teamed up with @inworld_ai and @Anam__ai to build a hyper interactive agent that watches your face and observes your tone. When you go quiet, it notices. When you look like you're about to lose it, it softens. Try it (you're welcome): visionagents.ai
English
5
9
26
2K
Inworld AI
Inworld AI@inworld_ai·
We just shipped something nobody else has. Our STT API doesn't just transcribe, it returns a full voice profile in the same response: emotion, accent, age, pitch, style, each with a confidence score, all in one call. When you feed that into our Realtime API, the LLM decides not just what to say but how to say it. It adds steering tags that our TTS-2 model picks up and uses to guide how it speaks. So, if a user sounds frustrated, the voice responds with empathy. If they're excited, it matches that energy. Here's our very own @ClintMcLean_ walking you through how the full pipeline works:
English
6
10
88
6.4K