Soniox

279 posts

Soniox banner
Soniox

Soniox

@soniox_ai

Low-latency real-time speech-to-text, text-to-speech and translation APIs. X video bot: "@soniox_ai transcribe this" "@soniox_ai translate this to /language/"

United States Katılım Mart 2022
1 Takip Edilen677 Takipçiler
Sabitlenmiş Tweet
Soniox
Soniox@soniox_ai·
Big moment for Soniox: Today we’re launching Soniox Text-to-Speech. This is a major step forward for us. Soniox started with speech-to-text. Now, with both STT and TTS, we are becoming the voice platform for every language. Soniox TTS is built for the hardest parts of speech generation: - Native-speaker-quality speech in 60+ languages - Hallucination-free speech generation - Alphanumerics spoken correctly like numbers, IDs, addresses - Correct pronunciation for names and foreign words - Ultra-low-latency streaming for real-time voice applications And the pricing is simple: $0.70 per hour of generated speech. What excites us most is the bigger picture: Developers and companies can now work with one provider for the core voice stack: speech-to-text, text-to-speech, multilingual voice, real-time infrastructure, regional deployments, and compliance. This is a big step in our transition from an STT provider to the voice platform for every language. Voice is becoming a core interface for software. But to work globally, it has to be fast, accurate, robust, and affordable across every language. That is what we are building at Soniox. Read the blog post: soniox.com/blog/soniox-te…
Soniox tweet media
English
79
167
2K
32.8M
Soniox
Soniox@soniox_ai·
Soniox speech AI transcribing + translating a live show in a bar, with piano and crowd chatter background. P.s.: Trying a new demo format. Feedback welcome.
English
0
0
0
22
Soniox
Soniox@soniox_ai·
Here is the video transcript you requested: Speaker 1: [Chinese] 跳舞的时候它能爬起来吗? Speaker 2: [Chinese] 能。 Speaker 1: [Chinese] 那为什么人能机器人不能呢?对这种摔倒而做出的反应。 Speaker 2: [Chinese] 自我意识。 Speaker 1: [Chinese] 自我学习的能力,对不对?所以今天啊我们就要来学习能让它自我学习的算法神经网络。好,接下来我们先来看。 Speaker 3: [Korean] 학생들은 수업을 통해 자연스레 기술에 대한 호기심을 갖습니다. Speaker 1: [Korean] 你们这是。 Speaker 3: [Korean] 덕분에 과학자를 꿈꾸는 아이들도 많습니다. Speaker 4: [Chinese] 因为我家里就有一个机器人,我妈妈就是她可以输程序然后把它输到机器人里,每次就是她会给我展示一段机器人,比如她编的舞蹈,然后那时我就产生了非常大的兴趣。 Speaker 2: [Chinese] 我想当一名科学家,就是可以用AI制作一些会一些我们人可以做的事情,比如说烧饭什么的,就是医生的一些技能她也学会就可以帮助很多人。 Speaker 5: [Chinese] 面向学生开展AI的通识教育,让他们在小学阶段就接触到这一些前沿应用,对AI不陌生,也可以把他们称之为是AI时代的这个原住民。 soniox.com/X/205909996399…
中文
0
0
0
18
H
H@hmmmmmm1458·
중국이 기술발전이 빠른이유 초등학생이 AI 신경망 수업… 한국도 교육시스템을 바꿔야한다고 생각합니다.
한국어
59
92
474
244K
Soniox
Soniox@soniox_ai·
Here is the video transcript you requested: Speaker 1: [Portuguese] Tem nada de esconder, é por isso que eu insisto aqui o tempo inteiro: vamos instalar a CPMI do Banco Master. Eu desafio o governo Lula a colocar sua base para pressionar o presidente do Congresso para que ela seja instalada. Isso não acontece porque ele tem muito a explicar ainda, ele é que tem que explicar porque o que é que ele foi fazer em mais de 7 ou 8 reuniões fora da agenda, não só com o Gorkari, mas com o Augusto Lima, da Bahia. Muita gente está esquecendo de falar desse nome, foi onde tudo começou, o Credicesta lá na Bahia, foi como o Banco Master começou a se alavancar, inclusive. O que é que o presidente Lula quis dizer quando falou ao banqueiro que ele deveria aguardar mais um tempo para trocar o presiden... Read full transcript here: soniox.com/X/205955568395…
Português
0
0
0
18
Edvaldo RReis
Edvaldo RReis@edvaldorreis·
FUTURO PRESIDENTE DO BRASIL RESPONDE PERGUNTA PRETENCIOSA DE JORNALEIRO APÓS VISITA A DONALD TRUMP MA CASA BRANCA, VEEEEEEJA:
Português
1
0
0
13
Soniox
Soniox@soniox_ai·
Here is the video transcript you requested: Speaker 1: Perché il problema è che, ogni volta che spingi la scatola lungo la strada, la scatola diventa più pesante. Quindi il presidente Trump ha deciso, per la prima volta dal 1979, che non sarebbe stato lo stesso presidente che spingerà la scatola lungo la strada come i suoi predecessori. E ci è voluto un enorme coraggio perché accadesse. Eh, sai, credo di sentire sempre, sai, persone descrivere il presidente Trump come un taco, "Trump si spaventa sempre." Quando si tratta dell'Iran, è completamente sbagliato. soniox.com/X/205955552711…
Italiano
0
0
0
23
Soniox
Soniox@soniox_ai·
@himuK0105 Feedback from real users that touched your production deployment is what matters. It's the thing we bet on.
English
0
0
0
26
Himu ¹⁰
Himu ¹⁰@himuK0105·
Most voice AI models look impressive on benchmarks. But real-world speech is far more complicated than benchmark datasets make it seem. Different accents. Regional dialects. Background noise. Code-switching. Low-resource languages. That’s where many voice models still struggle and where evaluation actually matters most. What I find interesting about SONAR from @psdnai is that it doesn’t just focus on one polished final score. Instead, it seems designed to identify the weak spots that real-world voice AI systems still face today. Because in practice, a model that performs well only under “perfect conditions” is not enough for global-scale AI systems. Voice AI needs evaluation systems that reflect how people actually speak: → diverse languages → imperfect environments → natural conversations → real-world variability And honestly, that feels like the direction the industry needs right now. Better benchmarks create better models.
Himu ¹⁰ tweet media
Story@StoryProtocol

One of the biggest gaps in voice AI today is evaluation. A model can score well on standard benchmarks but fail in real-world environments, dialects, and low-resource languages. SONAR from @psdnai is built to surface those failure modes, not hide them behind aggregate scores.

English
2
0
2
90
Fabian K
Fabian K@techfabk·
@theresanaiforit @Speechmatics Speed without accuracy creates expensive overhead. For me, the future of work requires tools that operate across 55+ languages without manual review. Sub-second latency is only valuable when the output is reliable.
English
1
0
2
21
There's An AI For That
There's An AI For That@theresanaiforit·
Fast and wrong is still wrong. Most voice AI benchmarks measure speed. Not accuracy. @Speechmatics tops the board on both: → 25% higher accuracy than many others → Sub-second latency across 55+ languages Watch the break down of what the benchmarks mean for production.
English
6
17
36
9.4K
Yaroslav Bulatov
Yaroslav Bulatov@yaroslavvb·
Flash 3.5 is so fast. Last weekend, my hard-of-hearing friend complained about Apple's FaceTime captions, so I told her she could build her own. Took her just a couple of hours in Antigravity using @soniox_ai for streaming recognition.
Natalia@n_ta_a

Apple captions were hard to read/missing accuracy, so I ended up building a much better version from scratch using Antigravity + Soniox API Super helpful for folks who are deaf + multilingual github.com/NAntonova/floa…

English
2
1
14
1.6K
Soniox
Soniox@soniox_ai·
@n_ta_a Glad to see Soniox helps you! By the way, we also provide a MCP that you can plug into Antigravity - it should make use of our API more efficient. soniox.com/docs/ai-engine…
English
0
0
0
48
Natalia
Natalia@n_ta_a·
Apple captions were hard to read/missing accuracy, so I ended up building a much better version from scratch using Antigravity + Soniox API Super helpful for folks who are deaf + multilingual github.com/NAntonova/floa…
English
0
0
3
1.8K
Soniox
Soniox@soniox_ai·
A quick thing about how speech translation usually works, and why it lags. The common setup is a pipeline: speech-to-text first, then the finished text goes to a separate translation step. It's two systems with two round-trips and you can't render anything until the first one commits a full sentence and the gap between is the latency people feel in live captions. Soniox collapses it into a single token stream. Transcription and translation arrive together over one WebSocket, interleaved, each token labeled so you know whether it's original speech or translated output. Translation begins partway through a sentence rather than after it. For conversations, two-way mode handles both directions at once, where each speaker reads the other in their own language without your app managing whose turn it is. A breakthrough in real-time speech translation that's been baked in Soniox API for some time now.
English
0
0
4
182
Soniox
Soniox@soniox_ai·
Example of Chinese to English real-time speech to text translation with Soniox. With speaker separation turned on. We support live translation across 60 languages, any to any (3600 language pairs), including mixed speech. Source video: youtube.com/watch?v=XrP2D9…
YouTube video
YouTube
English
0
0
5
335
Soniox
Soniox@soniox_ai·
@telnyx Reliable STT is honestly the one most teams underestimate until they hit prod. Low turn latency means nothing if the ASR is dropping words on accents or background noise. We deal with this every day at Soniox. Our streaming STT was basically built around that problem.
English
0
0
1
69
Telnyx
Telnyx@telnyx·
After thousands of voice AI deployments, the same production lessons keep showing up. Good voice agents need more than a natural-sounding voice. They need: • Clear call scope • Low turn latency • Reliable speech recognition • Clean tool access • Human handoff that carries context • Evals built from real calls • Safety rules outside the prompt • A weekly operating loop after launch So we put together 100 practical tips for teams building AI voice agents in 2026. The guide covers what matters once agents start handling real calls: noisy audio, interruptions, slow systems, workflow failures, policy limits, and escalation paths. Read the full list here: telnyx.com/resources/ai-v…
Telnyx tweet media
English
2
1
4
226
Soniox
Soniox@soniox_ai·
LiveKit Agents is a great base for this. The thing that quietly decides whether a telephony agent will work is the STT, and the failure modes shift once you go from demos to production calls. Partial latency starts mattering more than final-result latency because barge-in detection depends on it. Endpointing gets weird on noisy lines, you either cut users off mid-thought or wait too long and the agent feels dead. Accents and code switching are where WER drifts silently. This are the hard parts we focus on at Soniox.
English
0
0
0
42
Soniox
Soniox@soniox_ai·
C++ developers aren't left out. @fatehmtd built a community C++ library for the Soniox API. We love to see community integrations. It's validation that our API delivers and is well received by developers who take the time to build on it. Go give it a star: github.com/fatehmtd/Sonio…
English
0
0
3
246
Soniox
Soniox@soniox_ai·
@momomoss01 Our async transcription API can process up to 5h long files, max 1GB.
English
1
0
0
10
momomoss01
momomoss01@momomoss01·
@soniox_ai 你好,我想问一下异步转录的api,对mp3的文件大小有什么限制,还有文件时长有什么限制吗?
中文
1
0
0
16
Soniox
Soniox@soniox_ai·
Need live translation between Spanish and English? Or between Hindi and English? Or between Spanish and Hindi? Chinese to Arabic, French to German, Japanese to Korean or Chinese, Portuguese to Russian… GPT Realtime Translate alternative that scales. We support any-to-any translation across 60+ languages. Real-time and async. One-way or two-way modes. Low latency streaming. Try Soniox Speech Translation API → soniox.com/speech-transla…
Soniox tweet media
English
1
33
77
459.6K
Soniox
Soniox@soniox_ai·
STT is an infrastructure layer. When it works, nobody notices it. When it fails, every app running on it starts choking. We build Soniox to be the speech AI layer you can just build on and stop worrying about. Whether you do real-time speech recognition, generation, or translation, Soniox is the low-latency provider you can rely on. soniox.com/docs
English
1
2
2
294