Most voice AI teams delay this decision far longer than they should - and eventually hit the same wall.
The LLM ends up being the single biggest driver of both latency and cost.
The solutions teams reach for are usually workarounds:
• Racing multiple LLM endpoints and taking the fastest response (works, but expensive)
• Cron jobs that monitor response times and hot-swap providers
• Hosted OSS via Fireworks or Baseten (useful as a bridge, rarely the destination)