Ricardo Buitrago

31 posts

Ricardo Buitrago

Ricardo Buitrago

@rbuit_

ML at Cartesia AI | CMU

เข้าร่วม Şubat 2025
149 กำลังติดตาม165 ผู้ติดตาม
ทวีตที่ปักหมุด
Ricardo Buitrago
Ricardo Buitrago@rbuit_·
Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!
Ricardo Buitrago tweet media
English
6
34
197
42.4K
Shrey Gupta
Shrey Gupta@Shrey2809·
686 days. that's how long @LewisHamilton waited for a win and he got his first in red this weekend, in Spain. bold bets take time. then they land all at once. @cartesia made one too: be the best in voice on one provider. Sonic 3.5 + Ink 2, both #1.
English
6
2
23
389
David W. Romero
David W. Romero@davidwromero·
Getting real-time, top-tier text-to-speech and speech-to-text used to mean stitching together a patchwork of vendors—one for the voice, another for the transcription, and a tangle of integration work in between. Today, this changes. @cartesia now has both in a single place: one provider, one experience, both directions of voice. Faster, more accurate, and cheaper than what you were piecing together before. This is not a result of throwing more compute and data at the problem; it's what happens when a research team innovates from first principles past the scaling playbook. I love this company and what we are creating. The future will be even better! 💚 Full release: cartesia.ai/launch
GIF
English
1
0
18
407
Brandon Chen
Brandon Chen@Brandon24784864·
fun fact about @cartesia's sota tts and asr models: we had the models in house for several months, but we decided it was too dangerous to release such advanced voice capabilities to the public... until now. try them now.
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
7
0
57
4.1K
Animesh Bohara
Animesh Bohara@animeshbohara·
Some teams train good models. Some train fast ones. We don't think you should have to choose. Today at @cartesia we shipped two: Sonic 3.5 (speaking) + Ink 2 (listening). Both SOTA, both realtime. 90ms TTFA. 3.6% WER, #1 on AA. Try them out at play.cartesia.ai
English
5
2
51
2.3K
Daniele Paliotta
Daniele Paliotta@DanielePaliotta·
Need TTS? We are SOTA! Need ASR? We are SOTA too ❤️ Our models are now the best at speaking, and listening. Try Cartesia Sonic 3.5 and Ink 2: cartesia.ai/launch
GIF
English
8
1
38
1.3K
Ido Amos
Ido Amos@AmosaurusRex·
I’ve only been at @cartesia for just over a month, but seeing all the talented people here up close, it’s really not surprising they’re shipping state-of-the-art text-to-speech and speech-to-text models like Sonic 3.5 and Ink 2. Huge congrats to the team, excited to be here :)
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
3
0
31
3.3K
Aiqi Liu
Aiqi Liu@MeetAiqi·
When @elipughresearch and the team dropped Ink-2, I had to see if this SOTA Speech-to-Text model lived up to the hype. So, I built a dictation app to find out. To my delight, it’s incredibly fast & accurate. You can just "ink it." InkIt is now free and open-source. Try it, fork it, and make it yours! 👇
English
69
65
157
26.7K
Arjun Patrawala
Arjun Patrawala@arjunpatrawala·
In January, @_albertgu got deeply involved in our TTS research program. Over the following 5 months, we developed ~3 innovations (no changes to data or model scale), each targeting a fundamental issue plaguing our prior models. The results:
Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English
3
4
51
9K
Lucy Liu
Lucy Liu@lulu32125·
Sonic 3.5 brings expressive, natural speech. Ink 2 brings contextual endpointing that makes conversations flow naturally. No more choosing between quality and speed. 
No more stitching together multiple providers. Just a complete voice stack.
GIF
English
9
0
42
2.5K
Ricardo Buitrago
Ricardo Buitrago@rbuit_·
@krandiash is there a money back guarantee if my Cartesia agent can't upgrade me to business?
English
0
0
8
217
Ricardo Buitrago รีทวีตแล้ว
Karan Goel
Karan Goel@krandiash·
We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.
English
725
593
2.6K
7M
Ricardo Buitrago
Ricardo Buitrago@rbuit_·
Luck can build a great model once. Deep understanding is what lets you do it again and again. One week after hitting #1 in Text to Speech, we're now #1 in Automatic Speech Recognition too!
Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

English
0
1
20
517
Ricardo Buitrago
Ricardo Buitrago@rbuit_·
When you meet with your friends, you don't type - you talk to them. Still, we prefer to type to ChatGPT. We have fixed this at Cartesia. Sonic 3.5, is both the most expressive and fast model out there!
Artificial Analysis@ArtificialAnlys

Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵

English
0
0
22
825
Ricardo Buitrago รีทวีตแล้ว
Aviv Bick
Aviv Bick@avivbick·
SSMs fail on recall tasks they have the capacity to solve. The two dominant approaches today, SSMs and sliding-window attention, both lack persistence: memory either decays over time or gets evicted. We built Raven to fix this, surpassing all prior linear models even at 16× their training sequence length. 🧵🐦‍⬛
English
5
58
395
52.3K
Ricardo Buitrago รีทวีตแล้ว
Arshia Afzal
Arshia Afzal@rshia_afz·
1/ SSMs struggle on recall benchmarks due to their fixed-size state. But are current models actually storing context “wisely”? Introducing Raven 🐦‍⬛, the first SSM with selective memory allocation! Raven achieves SOTA performance on recall-heavy tasks with the highest length generalization, extending up to 16× beyond its training sequence length. Raven is a strict upgrade over SWA in the way it stores past context! This is the most elegant model I’ve been involved in designing so far shoutout to @avivbick and @_albertgu for their trust and amazing work! Check out how Raven bridges between SWA and SSM👇
English
5
29
275
277.1K
Ricardo Buitrago รีทวีตแล้ว
DANN©
DANN©@DannPetty·
Designers really are becoming more powerful everyday.
English
24
40
961
54.7K
Y Combinator
Y Combinator@ycombinator·
Karumi (@karumihq) is an AI demo agent that makes product demos scalable — it talks, clicks, and personalizes every walkthrough, 24/7.
English
37
37
379
113.2K
Timothy Luong (Chongz)
Timothy Luong (Chongz)@chongz·
This Claude + Slack + Sonic 3 setup saves me 2 hours daily. I'm an engineer at @cartesia ($100M raised) and here's how it works: → Claude pulls updates from my Slack channels → summarizes everything → Sonic 3 converts to natural voice → sends audio briefing to my DMs Takes just 60 seconds to listen Then I added emotion tags and it sounds like an actual morning show host. Complete Tutorial 👇
English
32
33
311
53.8K
Ricardo Buitrago รีทวีตแล้ว
Karan Goel
Karan Goel@krandiash·
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
English
1.4K
1.2K
8.5K
4.9M