Evidently AI

2.3K posts

Evidently AI banner
Evidently AI

Evidently AI

@EvidentlyAI

Open source ML and LLM evaluation 📊 , testing 🚦and monitoring 📈 GitHub: https://t.co/37H9bfnYj6 Discord: https://t.co/ElZ9RlroUa

Katılım Şubat 2020
211 Takip Edilen2.5K Takipçiler
Sabitlenmiş Tweet
Evidently AI
Evidently AI@EvidentlyAI·
3️⃣ 2️⃣ 1️⃣ Our free course on LLM evaluations for AI product teams starts today! 🎥 7 days of byte-sized videos into your inbox ⭐️ Certificate upon completion 👩‍💻 No coding skills required 👩‍🎓500+ students have signed up You can still join the course👇 evidentlyai.com/llm-evaluation…
Evidently AI tweet media
English
3
1
7
1.8K
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How to evaluate an AI agent? Follow the tutorial as we: 1️⃣ Build an AI agent, 2️⃣ Create a test dataset, 3️⃣ Assess responses and tool choice, 4️⃣ Track the agent’s behaviour. Follow the tutorial from our LLM evals course: youtube.com/watch?v=9KMmad…
YouTube video
YouTube
English
0
0
0
154
Evidently AI
Evidently AI@EvidentlyAI·
A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: cutt.ly/SwrZWL0g How Uber improves driver availability at airports: Estimated time-to-request model, Earnings-per-hour prediction, and Driver-deficit forecasting. uber.com/en-GB/blog/for…
English
0
0
0
121
Evidently AI
Evidently AI@EvidentlyAI·
🦾 More AI agents aren’t always better. Google evaluated 180 agent setups and found multi-agent systems help with parallel tasks but can hurt sequential ones. The work also proposes a model to predict optimal agentic designs. research.google/blog/towards-a…
Evidently AI tweet media
English
0
0
1
85
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it Let’s test your RAG system! Follow the tutorial as we: 1️⃣ Build a RAG system, 2️⃣ Generate test data, 3️⃣ Evaluate answers for correctness and faithfulness. Watch the tutorial from our LLM evals course: youtube.com/watch?v=jckp5R…
YouTube video
YouTube
English
0
0
0
114
Evidently AI
Evidently AI@EvidentlyAI·
A Friday ML use case 📕 📚 From the database of 800 ML & LLM systems: cutt.ly/SwrZWL0g How GoDaddy built Lighthouse, an internal AI analytics platform: prompt engineering framework, model orchestration, solution architecture, and use cases. godaddy.com/resources/news…
English
0
0
0
70
Evidently AI retweetledi
Nnenna 👩🏽‍💻✨
Nnenna 👩🏽‍💻✨@nnennahacks·
(policyNIM oss tool) preflight command is working. when I provide a coding task, it kicks off a search through indexed policies to determine which rules are relevant for implementation. @nvidia for embedding w/ @OpenAI + @lancedb for vector storage. eval command is also working. using @EvidentlyAI for running eval suite.
Nnenna 👩🏽‍💻✨ tweet mediaNnenna 👩🏽‍💻✨ tweet media
English
1
2
4
419
Evidently AI
Evidently AI@EvidentlyAI·
🚦 Meta’s “Agents Rule of Two” According to Meta, AI agents should satisfy at most two of these conditions per session to reduce prompt-injection risk: - Handle untrusted inputs - Access sensitive data - Change state / act externally ai.meta.com/blog/practical…
English
0
0
0
47
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How do you know if your RAG works? You need to check: ✅ Can it find the right information? ✅ Is the final answer complete, relevant, and free of hallucinations? Watch the intro to RAG evaluation from our LLM evals course: youtube.com/watch?v=qI2qQf…
YouTube video
YouTube
English
0
0
1
161
Evidently AI
Evidently AI@EvidentlyAI·
💭 Can AI systems introspect? Anthropic’s new research suggests Claude models can sometimes identify and describe their own internal states. It’s still unreliable, but marks a step toward more transparent AI reasoning. anthropic.com/research/intro…
English
0
0
0
45
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it Can LLMs write engaging tech tweets? Follow the tutorial as we: 1️⃣ Build a tweet generator, 2️⃣ Score its outputs with custom LLM judges, 3️⃣ Improve the results with prompt iteration. Watch the tutorial from our LLM evals course: youtube.com/watch?v=KhkiM9…
YouTube video
YouTube
English
1
0
2
171
Evidently AI
Evidently AI@EvidentlyAI·
📚 Context is everything. OpenAI shares how it built an in-house data agent that answers complex questions in minutes. It uses 6 layers of context: - Table metadata - Human annotations - Codex enrichment - Company knowledge - Memory - Runtime context openai.com/index/inside-o…
English
0
0
1
112
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it Are LLMs good for classification tasks? We built an LLM-based classifier for a travel support chatbot and compared its performance to a classic ML model. Watch the tutorial from our LLM evals course: youtube.com/watch?v=Gl2X_o…
YouTube video
YouTube
English
0
1
2
155
Evidently AI
Evidently AI@EvidentlyAI·
🤖 How to develop and deploy chatbots at scale? DoorDash shares how they created a simulation platform and evaluation flywheel, allowing them to test chatbots with fast feedback loops and without production risk. careersatdoordash.com/blog/doordash-…
English
0
1
1
59
Evidently AI
Evidently AI@EvidentlyAI·
📌 In case you missed it How to create an LLM judge that aligns with human labels: - Define criteria - Create test dataset - Run evaluation prompt to see if the judge aligns with your labels - Evaluate the judge Watch the video from our LLM evals course: youtube.com/watch?v=kP_aaF…
YouTube video
YouTube
English
1
0
1
168
Evidently AI
Evidently AI@EvidentlyAI·
🔎 Scaling catalog attribute extraction with multi-modal LLMs Instacart shares how it built PARSE, a self-serve multi-modal LLM platform for structured product attribute extraction from text and images at scale 👇 tech.instacart.com/multi-modal-ca…
English
0
1
2
121