Nina Kuruvilla

399 posts

Nina Kuruvilla banner
Nina Kuruvilla

Nina Kuruvilla

@ninacali4

cofounder https//daily.co

San Francisco Katılım Nisan 2016
88 Takip Edilen278 Takipçiler
Nina Kuruvilla retweetledi
kwindla
kwindla@kwindla·
Join us on Thursday in SF for conversations about voice agents, speech models, and realtime AI infrastructure. I'm on a panel with: - @natrugrats from @DeepgramAI - @farazmsiddiqi from @getbluejay_ai - Aaron Lee from Parakeet Health There will be food and lots of opportunities to ask questions and share your knowledge. One thing I'm looking forward to is comparing notes about GTC last week.
kwindla tweet media
English
2
9
13
2K
Nina Kuruvilla retweetledi
Charles 🎉 Frye
Charles 🎉 Frye@charles_irl·
been working with @kwindla to play with and benchmark Nemotron 3 Super in advance of launch i am very excited about the applications that will be enabled by high-quality, low-latency inference with this model daily.co/blog/nvidia-ne…
Charles 🎉 Frye tweet media
English
1
2
28
2.8K
Nina Kuruvilla retweetledi
Daily
Daily@trydaily·
Join @NVIDIAAI for a Nemotron Labs livestream on building voice agents with open models. Tuesday, 3/3, link in thread, @modal @pipecat_ai @nvidia @kwindla Ben Shababo
English
1
2
7
1.2K
Nina Kuruvilla retweetledi
kwindla
kwindla@kwindla·
Join us Thursday February 26th at this month's Voice AI Meetup, in person in SF or via Live Stream. Themes this month are benchmarks, audio models, conversational video, and speech-to-speech. It's a full slate of technical fireside chats with people training models and building full-stack infrastructure. 1. Ricardo Herreros Symons and Sam Sykes of @Speechmatics on next-generation audio understanding, achieving low latency, and benchmarking real-world performance. 2. @quinnfavret of @tavus on building conversational video from the model level all the way up to sweating every pixel (and every vocalization) in the user interface. 3. Bo Xie of @OpenAI on training the new gpt-realtime-1.5 speech-to-speech model. There will be 🍕 and lots of good conversation. Thank you to Tavus for hosting this one at their San Francisco office. If you're in SF, this is an opportunity to see the amazing Tavus collection of classic computers.
kwindla tweet media
English
1
2
14
2.8K
Nina Kuruvilla retweetledi
kwindla
kwindla@kwindla·
Brand new speech-to-speech model from @OpenAIDevs today! GPT Realtime 1.5 achieves a very nice jump in tool calling and instruction following performance on our voice agent benchmarks. @charlierguo's demo video shows a great example of perfect performance on a hard end-to-end audio understanding and speech production task: the model captures a seven-character order number (mixed digits and numbers), and repeats it back. The demo video made me hungry. I definitely need some Inference Chips with my OpenAI Neural Net Burger.
OpenAI Developers@OpenAIDevs

Voice workflows just got stronger with gpt-realtime-1.5 in the Realtime API. The model offers more reliable instruction following, tool calling, and multilingual accuracy. Demo with @charlierguo

English
7
16
107
16.8K
Nina Kuruvilla retweetledi
kwindla
kwindla@kwindla·
Benchmarking LLMs for voice agent use cases. New open source repo, along with a deep dive into how we think about measuring LLM performance. The headline results: - The newest SOTA models are all *really* good, but too slow for production voice agents. GPT-4.1 and Gemini 2.5 Flash are still the most widely used models in production. The benchmark shows why. - Ultravox 0.7 shows that it's possible to close the "intelligence gap" between speech-to-speech models and text-mode LLMs. This is a big deal! - Open weights models are climbing up the capability curve. Nemotron 3 Nano is almost as capable as GPT-4o. (And achieves this with only 30B parameters.) GPT-4o was the most widely used model for voice agents until quite recently, so a small open weights model scoring this well is a strong indication that production use of open weights models will grow this year. Voice agents are a moderately "out of distribution" use case for all of our SOTA LLMs today. Literally, in the sense that there's not enough long, multi-turn conversation data in the training sets. Everyone who builds voice agents knows this intuitively, from doing lots of manual testing. (Vibes-based evals!) This benchmark scores LLMs quantitatively on instruction following, tool calling, and knowledge retrieval in long-context, multi-turn conversations.
kwindla tweet media
English
8
19
98
7.6K
Nina Kuruvilla retweetledi
kwindla
kwindla@kwindla·
This robot assistant from the NVIDIA CES Keynote on Monday is going viral. @NaderLikeLadder explains all the hottest emerging AI trends in one demo: AI applications in 2026 will be multi-model, multi-modal, hybrid cloud/local, use open source models as well as proprietary models, control robots and embedded devices in the physical world, and have voice interfaces. (And the demo had a cute robot *and* a cute dog. Gold.) The demo was built with @pipecat_ai. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from @huggingface is open source hardware. (You can order it now, I have one!). You can run the assistant locally on your own hardware, in the cloud, or both.
English
27
100
496
48.7K
Nina Kuruvilla retweetledi
Thor 雷神 ⚡️
Thor 雷神 ⚡️@thorwebdev·
Last year, during the @aiDotEngineer World’s Fair, @kwindla and the @pipecat_ai team at @trydaily produced this beautifully illustrated primer on Voice AI & Voice Agents. ​While a lot has happened since then, it’s still a fantastic resource to start your voice AI journey: go.thor.bio/pipecat-primer ​At @GoogleDeepMind, we recently shipped updated TTS and native audio models, and 2026 will see plenty more launches on the AI audio side! ​What would you like to see us ship to make it even easier for you to make your apps enjoyably conversational? 💬
Thor 雷神 ⚡️ tweet media
English
5
8
26
3.3K
Nina Kuruvilla retweetledi
Google Cloud
Google Cloud@googlecloud·
Introducing Gemini 3 Flash—frontier intelligence built for speed at a fraction of the cost. Learn more about the expanded Gemini 3 model family and how developers and enterprises around the world can get started today ↓ goo.gle/4p3gQwq
English
8
44
321
18.9K
Nina Kuruvilla retweetledi
Lina Colucci
Lina Colucci@lina_colucci·
Give your voice agents a face. Introducing Lemon Slice-2 – the world's first interactive talking AI video model Think of it as a face layer for your voice agents, available today in production as an API or an embeddable widget. What makes Lemon Slice-2 great: → Easy: All you need is 1 photo → Expressive: hands and full body motion → Any character and style: if it has a face, Lemon Slice-2 can animate it Under the hood: we trained a custom, 20B-parameter video diffusion transformer, streaming at 20fps on a single GPU. Infinite-length video generation with no error accumulation. Talk to the demo avatars on our homepage. Or, add your own interactive avatar to your website. Companies and creators are already building video agents that are: → Customer support agents who screen-share and show you where to click → Lead qualification agents that replace forms with real conversations → AI personal shoppers who try on clothes and guide purchases People don't engage with chat bubbles. They engage with faces. And now you can have a 24/7 spokesperson that’s the face of your brand. We’ve partnered with @elevenlabs, @trydaily, and @modal to make this happen. As part of this launch, we’re giving away a guide: “How to Add a High-Converting Video Agent to Your Website in under 10 Minutes” Retweet and comment "LEMON SLICE" below and I'll send it over 🍋
English
291
249
950
1.2M
Nina Kuruvilla retweetledi
Pipecat AI
Pipecat AI@pipecat_ai·
‼️
Inworld AI@inworld_ai

We're making Inworld TTS free until the end of the year (!) We were feeling in the holiday spirit today, and after seeing the community rate our TTS at #1 on leaderboards and help us grow 100% week on week, we wanted to gift something and give every builder the chance to try what's topping the benchmarks. Merry Christmas, Happy holidays. Inworld TTS is free this month. Look out next week: we're going to redefine what #1 means.

QST
0
1
6
1.2K
Nina Kuruvilla retweetledi
Tanushree
Tanushree@_tanushreeeee·
Had some fun building voice agents with @pipecat_ai! Observability is super important when it comes to voice — when things go wrong, you’re debugging the audio, latency, and model behavior all at once. 🤝Checkout our LangSmith/Pipecat integration for full visibility into your Pipecat application. Pipecat made it really easy to get started. Kudos to @kwindla and team for the smooth DX!
LangChain@LangChain

🔊 The STT → Agent → TTS “sandwich” is a standard voice agent pattern. It’s easy to get started, tough to build reliable systems. 😵‍💫 Learn how to debug voice agents: We created a voice agent with @pipecat_ai and sent traces to LangSmith to show exactly how to get visibility into and debug complex pipelines. Stop guessing, start tracing! 👇 Video: youtu.be/0FmbIgzKAkQ Docs: docs.langchain.com/langsmith/trac…

English
0
3
7
1K
Nina Kuruvilla retweetledi
Davit @Krisp
Davit @Krisp@davitb·
Voice AI isn’t just struggling with accuracy... it’s struggling with understanding. I sat down with @klemensimonic (@soniox_ai) and @kwindla (@trydaily) about what’s really holding Voice AI back. Here’s what stood out 🧵
English
2
6
12
716
Nina Kuruvilla retweetledi
kwindla
kwindla@kwindla·
I'm at AWS re:Invent this week, talking to lots of AWS customers and product teams about voice agents. I'll be hanging out with @DeepgramAI CEO Scott Stephenson at the Deepgram booth on Thursday. We'll be answering questions about all the big new AWS voice and agent announcements happening this week. Come by and say hi if you're interested in voice AI. There are three AWS + @pipecat_ai launches at this year's re:Invent: 1. The SageMaker platform now supports bidirectional streaming, with full Pipecat compatibility. Pipecat voice agents can now use Deepgram's speech-to-text and text-to-speech models running on AWS SageMaker. This is a much-requested capability from enterprises that want to run voice agents entirely within their own VPCs using AWS services they already leverage for agent workflows. 2. Amazon Bedrock AgentCore is a serverless platform for AI agents. You can deploy Pipecat voice agents to AgentCore, with full support for WebRTC and telephony client connections. We often hear from AWS customers that they'd like to be able to "deploy voice agents to AWS Lambda." One way to think about AgentCore is that it's like Lambda, but designed from the ground up to enable scalable AI agent workloads. 3. You can also *use* Amazon Bedrock AgentCore agents from a Pipecat voice agent. We're seeing more and more production agents that combine a fast, conversational voice AI loop with other parallel tool-calling and agent loops. Pipecat enables these new patterns. Now you can use AgentCore as a flexible part of your voice AI toolkit.
kwindla tweet media
English
1
3
19
2.3K