Arjun Desai

337 posts

Arjun Desai

Arjun Desai

@jundesai

cofounder @cartesia_ai | Prev: ML PhD @stanfordailab, @apple fm

Katılım Ocak 2014
231 Takip Edilen912 Takipçiler
Arjun Desai
Arjun Desai@jundesai·
It always felt awkward to me that models offer a choice between quality and speed. The world’s best human collaborators are perceptive, responsive, and fast. Why can’t our models be? Research should be about solving the fundamental limitations to unlock unimaginable experiences. This is what @cartesia has always been about. Awesome to see these breakthroughs powering Sonic-3.5 and Ink-2.
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
2
8
45
6.2K
Arjun Desai retweetledi
Karan Goel
Karan Goel@krandiash·
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
English
725
597
2.6K
7M
Animesh Bohara
Animesh Bohara@animeshbohara·
Some teams train good models. Some train fast ones. We don't think you should have to choose. Today at @cartesia we shipped two: Sonic 3.5 (speaking) + Ink 2 (listening). Both SOTA, both realtime. 90ms TTFA. 3.6% WER, #1 on AA. Try them out at play.cartesia.ai
English
5
2
51
2.3K
Albert Gu
Albert Gu@_albertgu·
Within the span of a week, we launched streaming TTS (text-to-speech) and STT (speech-to-text) models that topped the leaderboards. I'm incredibly proud of the research team for their relentless pursuit of improvement, which have unlocked new state-of-the-art audio models on the Pareto frontier of speed and quality. As a research problem, speech requires fusing both text and audio and is the gateway to general multimodal models. We built Sonic-3.5 and Ink-2 from the ground up, developing multiple innovations along the way in a direction that will scale to general real-time intelligence. I've personally been deeply involved in building these models and more; it's been a blast working with the incredibly talented research team here @cartesia, and I can't wait to show the world what's coming next :)
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
3
16
179
20.3K
Lucy Liu
Lucy Liu@lulu32125·
Sonic 3.5 brings expressive, natural speech. Ink 2 brings contextual endpointing that makes conversations flow naturally. No more choosing between quality and speed. 
No more stitching together multiple providers. Just a complete voice stack.
GIF
English
9
0
42
2.5K
Aiqi Liu
Aiqi Liu@MeetAiqi·
When @elipughresearch and the team dropped Ink-2, I had to see if this SOTA Speech-to-Text model lived up to the hype. So, I built a dictation app to find out. To my delight, it’s incredibly fast & accurate. You can just "ink it." InkIt is now free and open-source. Try it, fork it, and make it yours! 👇
English
69
65
157
26.7K
Nahum Maru
Nahum Maru@nahuum_maru·
i worked a bunch on Ink-2 and am very excited for it to be released!! very proud of the team 🔥🔥🔥
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
4
0
25
914
Brandon Chen
Brandon Chen@Brandon24784864·
fun fact about @cartesia's sota tts and asr models: we had the models in house for several months, but we decided it was too dangerous to release such advanced voice capabilities to the public... until now. try them now.
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
7
0
57
4.1K
Daniele Paliotta
Daniele Paliotta@DanielePaliotta·
Need TTS? We are SOTA! Need ASR? We are SOTA too ❤️ Our models are now the best at speaking, and listening. Try Cartesia Sonic 3.5 and Ink 2: cartesia.ai/launch
GIF
English
8
1
38
1.3K
Arjun Desai retweetledi
Zubin Pratap
Zubin Pratap@ZubinPratap·
Give recruiters a phone number that talks back in your voice. A Conversational AI Voice Agent that knows your career history, handles interruptions, and tells your story. This one is for non-coders. 5 mins. Build with @cartesia + @claudeai. Remember: Voice AI Agents is much more than just TTS and STT — it’s everything your agent has to handle while listening to you. Video 👇
English
4
8
22
2K
Arjun Desai
Arjun Desai@jundesai·
Building models from first principles allows you to think about the capabilities that actually matter for intelligence. Ink-2 is our first native streaming ASR model built with support for the everyday, real-world conversation — low latency, endpointing, and accuracy. exciting to see these results from AA. more to come soon.
Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

English
2
4
34
1.4K
Arjun Desai
Arjun Desai@jundesai·
Kudos to the @cartesia TTS team for this feat!
Artificial Analysis@ArtificialAnlys

Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵

English
1
1
30
1.5K
Arjun Desai retweetledi
Brad Menezes
Brad Menezes@bradmenezes·
Introducing Superblocks 2.0: AI-generated enterprise apps – finally under IT control. Vibe-coded apps just became the #1 attack vector in the enterprise. Business teams are building on production data, while IT has zero visibility. No reviews. No audits. No permissions. No control. AI hackers are about to get 100x better. Anthropic proved it with Mythos. Superblocks 2.0 is the only platform to take back control: > Business teams build AI-powered apps with permissions baked in. > IT and Security can audit everything and lock down anything, instantly. > Engineering sets the standards. Every app follows them. Instacart, SoFi, and LinkedIn run Superblocks in production today. And larger organizations we can't yet name are too: A Fortune 500 just shut down 2,500 Replit users to standardize on Superblocks, running the platform air-gapped in their AWS environment. A 150,000-employee global services firm replaced Lovable with Superblocks to unlock AI-built apps on restricted internal systems. Every IT leader we’ve demoed to using Replit, Lovable or v0 asked for early access. Today we open access to the world. The genie is out of the bottle on employee vibe coding. Let it run wild, or take back control – superblocks.com
English
196
402
2.2K
4.6M