

Post




I am building a course on shipping production Voice AI agents for a major online education platform. Before I teach it, I want to live on it. So I've been building a voice agent for diabhey.com. Here is my current stack and learnings from v1: • @livekit for realtime transport and the agent framework. Handles sessions, room events, and metrics out of the box. • Silero VAD with min_silence_duration set to 250ms. The plugin default is 550ms. VAD tuning is the single biggest lever on how a voice agent actually feels. 550ms felt sluggish in conversation, 250ms felt natural, but go much lower and you'll cut users off mid-thought. • @DeepgramAI for STT. • @cerebras running Llama 3.1 8B for the LLM. Picked it for raw token throughput. In voice, tokens per second matters more than model size. You're racing a user's attention span, not a benchmark. • @cartesia for TTS. • @usemoss for retrieval. It's an in-process semantic search engine in Rust/WebAssembly, so lookups stay in the agent process with no network hop. If you're shipping voice agents right now, what's moved latency the most for you? Drop it below. I'm collecting real patterns for the course.