Harshit Mishra

419 posts

Harshit Mishra banner
Harshit Mishra

Harshit Mishra

@Harshit_senpai

Building Voice AI | trust me i can code

Korba, India Katılım Ocak 2022
562 Takip Edilen113 Takipçiler
Harshit Mishra
Harshit Mishra@Harshit_senpai·
So i'm building a Voice AI platform and, Our knowledge base lookup latency is ~20ms. Most voice AI platforms don't publish this number. Some don't even measure it. Here's exactly how we got there with actual logs. Thread.
Harshit Mishra tweet media
English
1
1
1
32
Ardent_Dev
Ardent_Dev@ardent__dev·
Founders 👇 Drop your product below. Let's drive some traffic your way 🚀 The best ones will get featured on EverFeatured a curated, quality-first product directory.
English
164
3
93
4.8K
mscode07
mscode07@mscode07·
Drop your product 👇 Let's do some Marketing!!
English
222
4
78
6.5K
John
John@ionleu·
drop ur startup link
English
495
6
183
20.5K
Csaba Kissi
Csaba Kissi@csaba_kissi·
Share your website/project, guys👇
English
289
5
140
12.1K
Suni
Suni@suni_code·
Drop your project URL 👇🏻 Let’s drive some traffic!!!...
English
325
4
117
12.9K
Harshit Mishra
Harshit Mishra@Harshit_senpai·
pre-warming the index at call connect dropped average KB latency from ~180ms to 23ms. Not a model change. Not more compute. Just: do the work earlier. Most latency in voice AI is avoidable with architecture, not hardware.
English
0
0
0
16
Harshit Mishra
Harshit Mishra@Harshit_senpai·
What we're running under the hood: → Qdrant for vector search (self-hosted, same region as inference) → Embedding pre-computed at agent config time, not at call time → Retrieval triggered on partial STT transcript (first 70% confidence threshold) → Top-K=3
English
1
0
0
21
Harshit Mishra
Harshit Mishra@Harshit_senpai·
The fix isn't a faster vector DB. It's architectural: → Pre-warm the KB index at call start, not at query time → Run retrieval async kick it off the moment STT begins transcribing, before the transcript is even complete → Cache the top-K chunks from the first 3 turns
English
1
0
0
14
Harshit Mishra
Harshit Mishra@Harshit_senpai·
First: why KB latency matters more than people think. In a live voice call, every millisecond of silence feels like 10x that to the caller. If your KB lookup takes 400ms synchronously during inference, you've just added 400ms of dead air. The user thinks the agent is broken.
English
1
0
1
21
Harshit Mishra
Harshit Mishra@Harshit_senpai·
Here's what a naive KB implementation looks like in the logs: STT transcript received LLM inference started KB lookup triggered ← inline, blocking KB result returned ← 383ms gap LLM first token out TTS synthesis started 383ms of silence. On a voice call. Unacceptable.
English
1
0
0
21
Lewis | CRO & Landing Pages
I've compiled 110+ Landing Page Before/Afters in a single file. This Figma file is yours for FREE. Like + Comment "CRO" and I'll send it to you (must be following) RT is appreciated
Lewis | CRO & Landing Pages tweet media
English
1K
141
1.3K
113.4K