Srilakshmi Chavali

89 posts

Srilakshmi Chavali

Srilakshmi Chavali

@schavalii

dev rel @ Arize || uc berkeley alum 🎓

Katılım Ocak 2025
43 Takip Edilen47 Takipçiler
Srilakshmi Chavali retweetledi
Arize AI
Arize AI@arizeai·
We just released a new Prompt Tutorial for Arize AX: create, test, and optimize prompts with real data and evaluation. It's easy to tweak a prompt until it "feels" better without knowing if it actually improved. This tutorial walks you through a repeatable create → test → optimize workflow: 💻 Create: System and user message templates, variables, save to Prompt Hub with versioning 🧪 Test: Run on a dataset, add LLM-as-a-Judge evaluators, see how it performs 📈 Optimize: Improve from evaluation feedback, compare versions, validate before production If you're building with LLMs and want a clear path from first prompt to production, this tutorial covers the full workflow in Arize AX. Get started below ⬇️ arize.com/docs/ax/prompt…
English
0
1
3
110
Srilakshmi Chavali retweetledi
Arize AI
Arize AI@arizeai·
We just released a new Tracing tutorial for Arize AX: a practical walkthrough for instrumenting, inspecting, and evaluating your AI applications with end-to-end traces. If you’re building agents/LLM apps, it’s hard to improve what you can’t see. This tutorial shows how to go from those black-box outputs to fully traceable, debuggable workflows. It covers: 🔍 Sending your first traces: Instrument your application and start capturing spans, inputs, outputs, and metadata in just a few steps 🧵 Understanding trace structure: Break down agents into spans for models, tools, retrieval, and custom logic so you can see exactly what happened 📝 Adding annotations & evaluations: Layer in human feedback, code-based checks, or LLM evaluators directly on traces 🔄 Analyzing sessions end-to-end: Group traces into sessions to debug multi-step interactions and user journeys Get started below ⬇️
English
2
1
4
233
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
Phoenix 13.0 Phoenix 13 is a major release centered around Dataset Evaluators, a new system that turns your datasets into reusable evaluation suites. This release also introduces custom model providers, OpenAI Responses API support, and dozens of Playground and experiment UX improvements.
English
1
3
12
737
Srilakshmi Chavali retweetledi
Arize AI
Arize AI@arizeai·
In our latest Evals Series webinar, we shared an Evaluation 101 primer for teams building agents. We covered the full eval workflow: starting from real data, defining specific + actionable metrics, and using LLM-as-a-Judge thoughtfully (prompt design, model selection, tradeoffs). We wrapped with a live Arize AX walkthrough — building an evaluator and running it against real trace data to show how evals plug directly into agent reliability workflows. Watch the recording here: youtube.com/watch?v=qkMmJF…
YouTube video
YouTube
English
0
4
6
308
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
We just published a new Get Started with Phoenix Quickstart in TypeScript 🚀 Using a @mastra multi-agent system, it walks through the workflow we recommend for improving agent apps in a way you can measure + validate: 🔹 Trace agents to capture spans across execution flow, tool calls, and LLM calls 🔹 Define an eval to score outputs and label failures, and generate explanations 🔹 Build a dataset of failure cases so you have concrete data to test iterations 🔹 Run experiments & Test your Prompts to compare agent versions on the same inputs and verify improvements Follow the full trace → eval → iterate loop end to end with @ArizePhoenix Try it here:
English
2
6
14
960
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
We're hosting a workshop with @HamelHusain: maven.com/p/2c8410/autom… @mikeldking will cover how to: ⚡ Connect Claude Code to Phoenix observability data ⚡Use CLI commands to fetch traces and debug agents ⚡Prompt AI to analyze system behavior in real-time
English
1
7
12
573
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
Check it out! #typescript" target="_blank" rel="nofollow noopener">arize.com/docs/phoenix/e… #typescript" target="_blank" rel="nofollow noopener">arize.com/docs/phoenix/e… #typescript" target="_blank" rel="nofollow noopener">arize.com/docs/phoenix/e…
English
0
1
3
84
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
Just Added: TypeScript tutorial for structured eval workflows in Phoenix 📊 Follow an end-to-end TS tutorial featuring a @LangChain application that shows how to: 🔺 Use real trace data from your app runs as the basis for evaluating LLM output 🔺 Build evaluators (built-in or custom) that score outputs on correctness, relevance, and other quality criteria 🔺 Run those evaluators with Phoenix’s TypeScript eval tooling to produce structured quality metrics you can act on Go from traced runs to measurable quality metrics with @ArizePhoenix TS evals 🚀
English
1
2
6
257
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
Just Added: Python narrative tutorials for Eval workflows 📊 We’ve published python guides that walk through an @AgnoAgi application with a full evals workflow: showing you how to measure & iterate on AI quality using real traces and code-driven evaluation. With these new tutorials, you will: ➡️ Run evals with built-in templates on real trace data ➡️ Customize your LLM judge endpoint for evaluation flexibility ➡️ Define & run custom eval templates tailored to your app’s needs Whether you’re scoring accuracy, relevance, or domain-specific criteria, these Python tutorials help you build repeatable, reliable AI quality checks with @ArizePhoenix!
English
1
3
7
285
Srilakshmi Chavali retweetledi
Aparna Dhinakaran
Aparna Dhinakaran@aparnadhinak·
People are having different experiences on: Claude Code vs Cursor at this point because of the IDE It feels logical to leap to the conclusion that IDE is dead, I'm still not so sure the future converges on one way of doing things.
English
3
3
11
842
Srilakshmi Chavali retweetledi
Arize AI
Arize AI@arizeai·
New in Arize AX: Evaluator Hub 🚀 Evaluator Hub introduces reusable, versioned evaluators that can be shared across evaluation tasks, instead of being redefined per task. This makes evaluation behavior easier to reason about as tasks, datasets, and models evolve. What this unlocks: 🔹Reuse the same evaluator definition across tasks to reduce eval drift 🔹Scope LLM configuration to the evaluator for consistent behavior 🔹Track changes with version history and commit messages 🔹Reuse evaluators across datasets using column mappings Try it out today!
English
1
2
6
180
Srilakshmi Chavali retweetledi
Aparna Dhinakaran
Aparna Dhinakaran@aparnadhinak·
I can't help but feel that: MCP = Context rot The future is CLI and files
English
50
9
192
18.7K
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
We just published a new Get Started with Phoenix Quickstart It walks through the workflow we recommend for improving agent apps in a way you can measure and validate: 🔹 Trace a @crewAIInc agent to capture execution flow, tool calls, and LLM calls 🔹 Define an eval (LLM-as-judge) to score outputs and label failures 🔹 Build a dataset of failure cases so you have concrete data to test iterations 🔹 Run experiments to compare agent versions on the same inputs and verify improvements The agent is built in Python with @crewAIInc, uses Serper for real-time search, and follows the full trace → eval → iterate loop end to end with Phoenix. Try it here: arize.com/docs/phoenix/g… TS version coming soon 👀
English
3
7
15
501
Srilakshmi Chavali retweetledi
arize-phoenix
arize-phoenix@ArizePhoenix·
🔍 New Phoenix Evaluation Integration: @CVSHealth’s UQLM (Uncertainty Quantification for Language Models) metric. UQLM gives you a quantitative measure of LLM uncertainty, so you can: ⚪️ Spot hallucination-prone responses by surfacing low-confidence outputs ⚪️ Flag uncertain generations for fallback, review, or guardrails ⚪️ Compare prompts & models more rigorously with uncertainty signals ⚪️ Monitor safety + reliability in production by tracking confidence drift 📘 Step-by-step guide to using Phoenix + UQLM: arize.com/docs/phoenix/i…
English
0
2
7
448