
Five neural nets, achieving completely local voice AI, no internet, on an M1 with only 16GB ram. Neural-based voice activity detection and turn detection means it's interruptible, but never interrupts me, and is able to sit idle and waiting. It's been flawless so far. 12B parameters is definitely smart enough for some very cool use-cases (will share more later). Computers that can "think" feel strangely alive compared to dumb or networked hardware. Fast? No. But crazy that it works at all on such a modest machine. The stack: - Silero VAD voice activity detection - Whisper Large v3 turbo - Smart Turn v2 by @trydaily - Kokoro_tts - Gemma_3_12B_it_QAT_Q4 rock-solid on @lmstudio - vision easily removed thx to gguf @ggerganov - @pipecat_ai integration by @kwindla
























