Michael Louis: "Most voice AI teams delay this decision far longer tha

Post

Michael Louis@MichaelLouis_za·9 Oca

Most voice AI teams delay this decision far longer than they should - and eventually hit the same wall. The LLM ends up being the single biggest driver of both latency and cost.

English

Michael Louis@MichaelLouis_za·9 Oca

The solutions teams reach for are usually workarounds: • Racing multiple LLM endpoints and taking the fastest response (works, but expensive) • Cron jobs that monitor response times and hot-swap providers • Hosted OSS via Fireworks or Baseten (useful as a bridge, rarely the destination)

English

Michael Louis@MichaelLouis_za·9 Oca

However, fine-tuning and self-hosting are no longer hard for modern engineering teams. With frameworks like @UnslothAI and Verl, teams can fine-tune strong models for a few hundred dollars - and regain control over: • Latency • Cost • Data residency / regionality • And (most importantly) Model outputs

English

Chia sẻ