Evidently AI

2.3K posts

Evidently AI banner
Evidently AI

Evidently AI

@EvidentlyAI

Open source ML and LLM evaluation 📊 , testing 🚦and monitoring 📈 GitHub: https://t.co/37H9bfnYj6 Discord: https://t.co/ElZ9RlroUa

Katılım Şubat 2020
211 Takip Edilen2.5K Takipçiler
Sabitlenmiş Tweet
Evidently AI
Evidently AI@EvidentlyAI·
3️⃣ 2️⃣ 1️⃣ Our free course on LLM evaluations for AI product teams starts today! 🎥 7 days of byte-sized videos into your inbox ⭐️ Certificate upon completion 👩‍💻 No coding skills required 👩‍🎓500+ students have signed up You can still join the course👇 evidentlyai.com/llm-evaluation…
Evidently AI tweet media
English
2
1
7
1.7K
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How do you know if your RAG works? You need to check: ✅ Can it find the right information? ✅ Is the final answer complete, relevant, and free of hallucinations? Watch the intro to RAG evaluation from our LLM evals course: youtube.com/watch?v=qI2qQf…
YouTube video
YouTube
English
0
0
0
49
Evidently AI
Evidently AI@EvidentlyAI·
💭 Can AI systems introspect? Anthropic’s new research suggests Claude models can sometimes identify and describe their own internal states. It’s still unreliable, but marks a step toward more transparent AI reasoning. anthropic.com/research/intro…
English
0
0
0
37
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it Can LLMs write engaging tech tweets? Follow the tutorial as we: 1️⃣ Build a tweet generator, 2️⃣ Score its outputs with custom LLM judges, 3️⃣ Improve the results with prompt iteration. Watch the tutorial from our LLM evals course: youtube.com/watch?v=KhkiM9…
YouTube video
YouTube
English
1
0
2
155
Evidently AI
Evidently AI@EvidentlyAI·
📚 Context is everything. OpenAI shares how it built an in-house data agent that answers complex questions in minutes. It uses 6 layers of context: - Table metadata - Human annotations - Codex enrichment - Company knowledge - Memory - Runtime context openai.com/index/inside-o…
English
0
0
1
107
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it Are LLMs good for classification tasks? We built an LLM-based classifier for a travel support chatbot and compared its performance to a classic ML model. Watch the tutorial from our LLM evals course: youtube.com/watch?v=Gl2X_o…
YouTube video
YouTube
English
0
1
2
149
Evidently AI
Evidently AI@EvidentlyAI·
🤖 How to develop and deploy chatbots at scale? DoorDash shares how they created a simulation platform and evaluation flywheel, allowing them to test chatbots with fast feedback loops and without production risk. careersatdoordash.com/blog/doordash-…
English
0
1
1
58
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How to create an LLM judge that aligns with human labels: - Define criteria - Create test dataset - Run evaluation prompt to see if the judge aligns with your labels - Evaluate the judge Watch the video from our LLM evals course: youtube.com/watch?v=kP_aaF…
YouTube video
YouTube
English
1
0
1
164
Evidently AI
Evidently AI@EvidentlyAI·
🔎 Scaling catalog attribute extraction with multi-modal LLMs Instacart shares how it built PARSE, a self-serve multi-modal LLM platform for structured product attribute extraction from text and images at scale 👇 tech.instacart.com/multi-modal-ca…
English
0
1
2
118
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How to assess LLM outputs with reference-free evals: 1️⃣ Text statistics 2️⃣ Regular expressions 3️⃣ ML models 4️⃣ LLM judge Watch the tutorial from our LLM evals course: youtube.com/watch?v=-zoIqO…
YouTube video
YouTube
English
0
2
6
514
Evidently AI
Evidently AI@EvidentlyAI·
✅ How to safely deploy ML models at scale? Uber shares its best practices: from data/feature validation and shadow testing to controlled rollouts and continuous monitoring – plus a safety scoring system to measure deployment readiness. uber.com/en-GB/blog/rai…
Evidently AI tweet media
English
0
0
1
93
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it Reference-based LLM evals and how to implement them: 1️⃣ Exact match 2️⃣ Semantic similarity 3️⃣ BERTScore 4️⃣ LLM-as-a-judge Watch the tutorial from our LLM evals course: youtube.com/watch?v=yD20c-…
YouTube video
YouTube
English
1
0
1
169
Evidently AI
Evidently AI@EvidentlyAI·
A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: cutt.ly/SwrZWL0g How Shopify built Shopify Sidekick, an AI assistant for online merchants: assistant architecture and evaluation frameworks for real-world deployment. shopify.engineering/building-produ…
English
2
1
4
190
Evidently AI
Evidently AI@EvidentlyAI·
✍️ Tips on evaluating AI agents Booking suggests a dual approach: ⬛️ Black box metrics to measure outcomes, e.g., task completion. ⬜️ Glass box metrics to audit the agent’s decision-making process, e.g., tool proficiency and tool reliability. booking.ai/ai-agent-evalu…
English
1
0
2
155
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How we built open-source automated prompt optimization. In this blog, we break down common prompt optimization strategies and show how we implemented automated prompt optimization in Evidently Open Source 👇 evidentlyai.com/blog/automated…
English
1
0
4
147
Evidently AI
Evidently AI@EvidentlyAI·
A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: cutt.ly/SwrZWL0g How Bayezian Limited, a tech company, uses AI agents to monitor protocol deviations in clinical trials: system architecture, agent flow, and lessons learned. aihub.org/2025/09/15/dep…
English
1
0
2
116