Kritin Vongthongsri

137 posts

Kritin Vongthongsri banner
Kritin Vongthongsri

Kritin Vongthongsri

@kritinv07

building evals @deepeval @confident_ai

San Fransisco انضم Haziran 2024
34 يتبع146 المتابعون
Kritin Vongthongsri أُعيد تغريده
Brian Neville-O'Neill
Brian Neville-O'Neill@bnevilleoneill·
I’m running LLM eval office hours today with @confident_ai 🧪 If you’re building anything with AI, drop a prompt + model output, and I’ll show where it breaks. I’ll look at: correctness completeness where it might fail in real use. Just quick, specific feedback #ai #LLM
English
0
3
2
139
Kritin Vongthongsri أُعيد تغريده
DeepEval
DeepEval@deepeval·
My sister just got released, DeepTeam v1.0, 100% open-source, Apache 2.0 red teaming for LLMs. ⭐ Star on GitHub to stay on top of the latest developments in AI security and safety: github.com/confident-ai/d…
English
1
5
11
927
Kritin Vongthongsri
Kritin Vongthongsri@kritinv07·
What makes platform UI enterprise ready?
English
0
0
0
180
Oleg Golev
Oleg Golev@oleg_golev·
Sentient hype train leaves soon, gonna ship shameless AI ads until that happens
English
18
1
42
1K
Kritin Vongthongsri
Kritin Vongthongsri@kritinv07·
Most people run single-turn evals on chatbots. But that’s not enough. Conversations aren’t Q&A — they happen over multiple turns. This means your chatbot must stay context-aware across the dialogue, not just accurate in isolated responses. @deepeval, we’ve seen too many teams evaluate chatbots the wrong way. So, we wrote a comprehensive guide on how to evaluate all chatbots properly, end-to-end.👇 🔗 deepeval.com/docs/getting-s…
Kritin Vongthongsri tweet media
English
1
2
6
442
Kritin Vongthongsri
Kritin Vongthongsri@kritinv07·
What’s more important in growing oss: building or writing docs?
English
1
1
2
265
Kritin Vongthongsri
Kritin Vongthongsri@kritinv07·
At @confident_ai, we’re focused on making evals great. But since we love our users very much, we’ve also just 5×’d the tracing analytics on our platform. Now you can: 🔍 Trace analytics — follow every request end-to-end ⏱️ Span analytics — see latency and cost per component 📊 Model analytics — compare performance, latency, and cost across models 👥 User analytics — understand usage patterns and behavior ⚠️ Error analytics — track and reduce failures over time
English
1
3
5
497