Srilakshmi Chavali

89 posts

Srilakshmi Chavali

@schavalii

dev rel @ Arize || uc berkeley alum 🎓

Katılım Ocak 2025

43 Takip Edilen47 Takipçiler

Srilakshmi Chavali retweetledi

Arize AI@arizeai·18 Mar

We just released a new Prompt Tutorial for Arize AX: create, test, and optimize prompts with real data and evaluation. It's easy to tweak a prompt until it "feels" better without knowing if it actually improved. This tutorial walks you through a repeatable create → test → optimize workflow: 💻 Create: System and user message templates, variables, save to Prompt Hub with versioning 🧪 Test: Run on a dataset, add LLM-as-a-Judge evaluators, see how it performs 📈 Optimize: Improve from evaluation feedback, compare versions, validate before production If you're building with LLMs and want a clear path from first prompt to production, this tutorial covers the full workflow in Arize AX. Get started below ⬇️ arize.com/docs/ax/prompt…

English

110

Srilakshmi Chavali retweetledi

Arize AI@arizeai·20 Şub

arize.com/docs/ax/observ… arize.com/docs/ax/observ… arize.com/docs/ax/observ… arize.com/docs/ax/observ…

ZXX

147

Srilakshmi Chavali retweetledi

Arize AI@arizeai·20 Şub

We just released a new Tracing tutorial for Arize AX: a practical walkthrough for instrumenting, inspecting, and evaluating your AI applications with end-to-end traces. If you’re building agents/LLM apps, it’s hard to improve what you can’t see. This tutorial shows how to go from those black-box outputs to fully traceable, debuggable workflows. It covers: 🔍 Sending your first traces: Instrument your application and start capturing spans, inputs, outputs, and metadata in just a few steps 🧵 Understanding trace structure: Break down agents into spans for models, tools, retrieval, and custom logic so you can see exactly what happened 📝 Adding annotations & evaluations: Layer in human feedback, code-based checks, or LLM evaluators directly on traces 🔄 Analyzing sessions end-to-end: Group traces into sessions to debug multi-step interactions and user journeys Get started below ⬇️

English

233

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·14 Şub

Phoenix 13.0 Phoenix 13 is a major release centered around Dataset Evaluators, a new system that turns your datasets into reusable evaluation suites. This release also introduces custom model providers, OpenAI Responses API support, and dozens of Playground and experiment UX improvements.

English

737

Srilakshmi Chavali retweetledi

Arize AI@arizeai·19 Şub

In our latest Evals Series webinar, we shared an Evaluation 101 primer for teams building agents. We covered the full eval workflow: starting from real data, defining specific + actionable metrics, and using LLM-as-a-Judge thoughtfully (prompt design, model selection, tradeoffs). We wrapped with a live Arize AX walkthrough — building an evaluator and running it against real trace data to show how evals plug directly into agent reliability workflows. Watch the recording here: youtube.com/watch?v=qkMmJF…

YouTube

English

308

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·11 Şub

arize.com/docs/phoenix/g… arize.com/docs/phoenix/g… arize.com/docs/phoenix/g… arize.com/docs/phoenix/g…

ZXX

158

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·11 Şub

We just published a new Get Started with Phoenix Quickstart in TypeScript 🚀 Using a @mastra multi-agent system, it walks through the workflow we recommend for improving agent apps in a way you can measure + validate: 🔹 Trace agents to capture spans across execution flow, tool calls, and LLM calls 🔹 Define an eval to score outputs and label failures, and generate explanations 🔹 Build a dataset of failure cases so you have concrete data to test iterations 🔹 Run experiments & Test your Prompts to compare agent versions on the same inputs and verify improvements Follow the full trace → eval → iterate loop end to end with @ArizePhoenix Try it here:

English

960

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·10 Şub

We're hosting a workshop with @HamelHusain: maven.com/p/2c8410/autom… @mikeldking will cover how to: ⚡ Connect Claude Code to Phoenix observability data ⚡Use CLI commands to fetch traces and debug agents ⚡Prompt AI to analyze system behavior in real-time

English

573

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·9 Şub

Check it out! #typescript" target="_blank" rel="nofollow noopener">arize.com/docs/phoenix/e… #typescript" target="_blank" rel="nofollow noopener">arize.com/docs/phoenix/e… #typescript" target="_blank" rel="nofollow noopener">arize.com/docs/phoenix/e…

English

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·9 Şub

Just Added: TypeScript tutorial for structured eval workflows in Phoenix 📊 Follow an end-to-end TS tutorial featuring a @LangChain application that shows how to: 🔺 Use real trace data from your app runs as the basis for evaluating LLM output 🔺 Build evaluators (built-in or custom) that score outputs on correctness, relevance, and other quality criteria 🔺 Run those evaluators with Phoenix’s TypeScript eval tooling to produce structured quality metrics you can act on Go from traced runs to measurable quality metrics with @ArizePhoenix TS evals 🚀

English

257

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·6 Şub

@AgnoAgi 📘 Dive into the docs 👇 arize.com/docs/phoenix/e… arize.com/docs/phoenix/e… arize.com/docs/phoenix/e…

English

107

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·6 Şub

Just Added: Python narrative tutorials for Eval workflows 📊 We’ve published python guides that walk through an @AgnoAgi application with a full evals workflow: showing you how to measure & iterate on AI quality using real traces and code-driven evaluation. With these new tutorials, you will: ➡️ Run evals with built-in templates on real trace data ➡️ Customize your LLM judge endpoint for evaluation flexibility ➡️ Define & run custom eval templates tailored to your app’s needs Whether you’re scoring accuracy, relevance, or domain-specific criteria, these Python tutorials help you build repeatable, reliable AI quality checks with @ArizePhoenix!

English

285

Srilakshmi Chavali retweetledi

Aparna Dhinakaran@aparnadhinak·29 Oca

People are having different experiences on: Claude Code vs Cursor at this point because of the IDE It feels logical to leap to the conclusion that IDE is dead, I'm still not so sure the future converges on one way of doing things.

English

842

Srilakshmi Chavali retweetledi

Aparna Dhinakaran@aparnadhinak·29 Oca

x.com/i/article/2016…

ZXX

258

25.8K

Srilakshmi Chavali retweetledi

Arize AI@arizeai·28 Oca

New in Arize AX: Evaluator Hub 🚀 Evaluator Hub introduces reusable, versioned evaluators that can be shared across evaluation tasks, instead of being redefined per task. This makes evaluation behavior easier to reason about as tasks, datasets, and models evolve. What this unlocks: 🔹Reuse the same evaluator definition across tasks to reduce eval drift 🔹Scope LLM configuration to the evaluator for consistent behavior 🔹Track changes with version history and commit messages 🔹Reuse evaluators across datasets using column mappings Try it out today!

English

180

Srilakshmi Chavali retweetledi

Aparna Dhinakaran@aparnadhinak·24 Oca

I can't help but feel that: MCP = Context rot The future is CLI and files

English

192

18.7K

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·8 Oca

We just published a new Get Started with Phoenix Quickstart It walks through the workflow we recommend for improving agent apps in a way you can measure and validate: 🔹 Trace a @crewAIInc agent to capture execution flow, tool calls, and LLM calls 🔹 Define an eval (LLM-as-judge) to score outputs and label failures 🔹 Build a dataset of failure cases so you have concrete data to test iterations 🔹 Run experiments to compare agent versions on the same inputs and verify improvements The agent is built in Python with @crewAIInc, uses Serper for real-time search, and follows the full trace → eval → iterate loop end to end with Phoenix. Try it here: arize.com/docs/phoenix/g… TS version coming soon 👀

English

501

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·30 Ara

🔍 New Phoenix Evaluation Integration: @CVSHealth’s UQLM (Uncertainty Quantification for Language Models) metric. UQLM gives you a quantitative measure of LLM uncertainty, so you can: ⚪️ Spot hallucination-prone responses by surfacing low-confidence outputs ⚪️ Flag uncertain generations for fallback, review, or guardrails ⚪️ Compare prompts & models more rigorously with uncertainty signals ⚪️ Monitor safety + reliability in production by tracking confidence drift 📘 Step-by-step guide to using Phoenix + UQLM: arize.com/docs/phoenix/i…

English

448

Srilakshmi Chavali@schavalii·22 Ara

RT @ArizePhoenix: @nvidia x @DeepLearningAI lauched a new course on building reliable agentic systems with the NeMo Agent Toolkit (NAT), fe…

English

Srilakshmi Chavali retweetledi

arize-phoenix@ArizePhoenix·12 Ara

Demo Code is here: github.com/Arize-ai/tutor…

English

117

Keşfet

@mastra @ArizePhoenix @HamelHusain @mikeldking @LangChain @AgnoAgi @crewAIInc @CVSHealth