Ricardo Buitrago (@rbuit_) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

English

6

34

197

42.4K

Ricardo Buitrago@rbuit_·15 Haz

@Shrey2809 @LewisHamilton @cartesia if @LewisHamilton had Shrey on his time he would have waited 90ms for a win

English

1

0

3

46

Shrey Gupta@Shrey2809·15 Haz

686 days. that's how long @LewisHamilton waited for a win and he got his first in red this weekend, in Spain. bold bets take time. then they land all at once. @cartesia made one too: be the best in voice on one provider. Sonic 3.5 + Ink 2, both #1.

English

6

2

23

389

Ricardo Buitrago@rbuit_·15 Haz

@davidwromero @cartesia I've never seen first principles shine more than at Cartesia 💚 thanks for the great work!

English

1

0

3

57

David W. Romero@davidwromero·15 Haz

Getting real-time, top-tier text-to-speech and speech-to-text used to mean stitching together a patchwork of vendors—one for the voice, another for the transcription, and a tangle of integration work in between. Today, this changes. @cartesia now has both in a single place: one provider, one experience, both directions of voice. Faster, more accurate, and cheaper than what you were piecing together before. This is not a result of throwing more compute and data at the problem; it's what happens when a research team innovates from first principles past the scaling playbook. I love this company and what we are creating. The future will be even better! 💚 Full release: cartesia.ai/launch

GIF

English

1

0

18

407

Ricardo Buitrago@rbuit_·15 Haz

@Brandon24784864 @cartesia fun fact about branch: he is the goat

English

1

0

3

84

Brandon Chen@Brandon24784864·15 Haz

fun fact about @cartesia's sota tts and asr models: we had the models in house for several months, but we decided it was too dangerous to release such advanced voice capabilities to the public... until now. try them now.

Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English

7

0

57

4.1K

Ricardo Buitrago@rbuit_·15 Haz

@animeshbohara I don't think I even have a 90ms TTFA lol

English

0

1

86

Animesh Bohara@animeshbohara·15 Haz

Some teams train good models. Some train fast ones. We don't think you should have to choose. Today at @cartesia we shipped two: Sonic 3.5 (speaking) + Ink 2 (listening). Both SOTA, both realtime. 90ms TTFA. 3.6% WER, #1 on AA. Try them out at play.cartesia.ai

English

5

2

51

2.3K

Ricardo Buitrago@rbuit_·15 Haz

@DanielePaliotta SOTA where's my guide

English

1

0

1

50

Daniele Paliotta@DanielePaliotta·15 Haz

Need TTS? We are SOTA! Need ASR? We are SOTA too ❤️ Our models are now the best at speaking, and listening. Try Cartesia Sonic 3.5 and Ink 2: cartesia.ai/launch

GIF

English

8

1

38

1.3K

Ricardo Buitrago@rbuit_·15 Haz

@AmosaurusRex @cartesia Keep it up!

English

0

2

42

Ido Amos@AmosaurusRex·15 Haz

I’ve only been at @cartesia for just over a month, but seeing all the talented people here up close, it’s really not surprising they’re shipping state-of-the-art text-to-speech and speech-to-text models like Sonic 3.5 and Ink 2. Huge congrats to the team, excited to be here :)

Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English

3

0

31

3.3K

Ricardo Buitrago@rbuit_·15 Haz

@MeetAiqi @elipughresearch wow amazing demo! this is what makes Cartesia feel special

English

0

4

33

Aiqi Liu@MeetAiqi·15 Haz

When @elipughresearch and the team dropped Ink-2, I had to see if this SOTA Speech-to-Text model lived up to the hype. So, I built a dictation app to find out. To my delight, it’s incredibly fast & accurate. You can just "ink it." InkIt is now free and open-source. Try it, fork it, and make it yours! 👇

English

69

65

157

26.7K

Ricardo Buitrago@rbuit_·15 Haz

@arjunpatrawala @_albertgu Definitely the fastest research improvement cycle I've ever seen 🚀

English

0

2

212

Arjun Patrawala@arjunpatrawala·15 Haz

In January, @_albertgu got deeply involved in our TTS research program. Over the following 5 months, we developed ~3 innovations (no changes to data or model scale), each targeting a fundamental issue plaguing our prior models. The results:

Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English

3

4

51

9K

Ricardo Buitrago@rbuit_·15 Haz

@lulu32125 So cool!

English

0

18

Lucy Liu@lulu32125·15 Haz

Sonic 3.5 brings expressive, natural speech. Ink 2 brings contextual endpointing that makes conversations flow naturally. No more choosing between quality and speed.  No more stitching together multiple providers. Just a complete voice stack.

GIF

English

9

0

42

2.5K

Ricardo Buitrago@rbuit_·15 Haz

@krandiash is there a money back guarantee if my Cartesia agent can't upgrade me to business?

English

0

8

217

Ricardo Buitrago รีทวีตแล้ว

Karan Goel@krandiash·15 Haz

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English

725

593

2.6K

7M

Ricardo Buitrago@rbuit_·28 May

Luck can build a great model once. Deep understanding is what lets you do it again and again. One week after hitting #1 in Text to Speech, we're now #1 in Automatic Speech Recognition too!

Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

English

0

1

20

517

Ricardo Buitrago@rbuit_·22 May

When you meet with your friends, you don't type - you talk to them. Still, we prefer to type to ChatGPT. We have fixed this at Cartesia. Sonic 3.5, is both the most expressive and fast model out there!

Artificial Analysis@ArtificialAnlys

Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵

English

0

22

825

Ricardo Buitrago รีทวีตแล้ว

Aviv Bick@avivbick·7 May

SSMs fail on recall tasks they have the capacity to solve. The two dominant approaches today, SSMs and sliding-window attention, both lack persistence: memory either decays over time or gets evicted. We built Raven to fix this, surpassing all prior linear models even at 16× their training sequence length. 🧵🐦‍⬛

English

5

58

395

52.3K

Ricardo Buitrago รีทวีตแล้ว

Arshia Afzal@rshia_afz·7 May

1/ SSMs struggle on recall benchmarks due to their fixed-size state. But are current models actually storing context “wisely”? Introducing Raven 🐦‍⬛, the first SSM with selective memory allocation! Raven achieves SOTA performance on recall-heavy tasks with the highest length generalization, extending up to 16× beyond its training sequence length. Raven is a strict upgrade over SWA in the way it stores past context! This is the most elegant model I’ve been involved in designing so far shoutout to @avivbick and @_albertgu for their trust and amazing work! Check out how Raven bridges between SWA and SSM👇

English

5

29

275

277.1K

Ricardo Buitrago รีทวีตแล้ว

Designers really are becoming more powerful everyday.

English

24

40

961

54.7K

Ricardo Buitrago@rbuit_·11 Kas

@ycombinator @karumihq Really impressive! Keep up the good work, this is a very promising product

English

0

2

144

Y Combinator@ycombinator·11 Kas

Karumi (@karumihq) is an AI demo agent that makes product demos scalable — it talks, clicks, and personalizes every walkthrough, 24/7.

English

37

379

113.2K

Ricardo Buitrago@rbuit_·28 Eki

@chongz @cartesia Chongz is going to take my job soon

English

1

0

2

324

Timothy Luong (Chongz)@chongz·28 Eki

This Claude + Slack + Sonic 3 setup saves me 2 hours daily. I'm an engineer at @cartesia ($100M raised) and here's how it works: → Claude pulls updates from my Slack channels → summarizes everything → Sonic 3 converts to natural voice → sends audio briefing to my DMs Takes just 60 seconds to listen Then I added emotion tags and it sounds like an actual morning show host. Complete Tutorial 👇

English

32

33

311

53.8K

Ricardo Buitrago@rbuit_·28 Eki

@krandiash Sonic This models sounds great! Check it out.

English

0

4

132

Ricardo Buitrago รีทวีตแล้ว

Karan Goel@krandiash·28 Eki

We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -

English

1.4K

1.2K

8.5K

4.9M

Ricardo Buitrago

ค้นพบ