The work to make a clinical answer trustworthy includes figuring out which source applies, when the evidence is weak, and what the doctor is trying to decide. Charlie @oneill_c at @baseten on why AI companies train their own models. Must read.
We serve Qwen3-TTS on vLLM-Omni at $3 per 1M characters. That's 90% lower in cost than comparable closed-source TTS APIs.
Our engineers optimized a single-replica serving stack to get there. Details on the optimized stack and cost per concurrent stream here.
This is a great piece from @baseten's @oneill_c:
The question every app layer company is now asking is: "how do we resist commodification to deliver better results for customers?"
The answer is specialized models based on your unique understanding of who you serve every day.
when I first heard of @baseten they were basically a competitor
but then I met @tuhinone and I liked him, and when I heard they were pivoting to inference, I was relieved because I didn't want to compete against him
and then he hired @DannieHerz and I was angry because I wish I had thought of it
and now they're a critical partner for us as we embrace our own many model future at @_hex_tech
I'm so happy for all their success and very excited to share what we've been working on with them!
The starting premise @Conviction was that AI (general models at scale) were a broad shift in computing. This has come to pass
But the way AI benefits many users more powerfully is going to be more distributed product/research work, in partnership with humans who do the work
“The question every app layer company is now asking is no longer ‘how do we use AI?’ It is ‘how do we resist commodification to deliver better results for customers?’ The answer is specialized models based on your unique understanding of who you serve every day. The big labs can’t do it, but you can.”
Baseten’s Head of Model Training, @oneill_c, on the wave of AI companies using post-training to deliver better results for customers via specialized models.
Visited my first-ever conference as a sponsor, and it was a wakeup call! 🫥So I made a "guide to not wasting money on conference booths."
Below are some learnings, checklist, & things I really loved online and irl.
Im also giving credits there to my and my friends' favorite conference booths. Check it out!🔗👇
Every day we're seeing more companies emerge with specialized models that push the SOTA forward. Congrats to the Speechify team on the launch of SIMBA 3.0! Very happy to partner with you.
More here: baseten.co/resources/cust…
We're proud to share our partnership story with @SpeechifyAI.
Speechify just announced SIMBA 3.0, ranked top 10 globally on the @ArtificialAnlys TTS leaderboard and the most cost-efficient model by far in that tier.
We’re honored to serve the full SIMBA TTS family and other core workloads for Speechify, achieving:
→ 44% lower cost per 1M characters
→ 30-50% lower p99 latency
→ 4.5x faster cold starts
Read the full case study here: