Arthur

1.1K posts

Arthur banner
Arthur

Arthur

@itsArthurAI

The AI Performance Company. Arthur helps teams discover, govern, and innovate AI systems that perform and scale reliably.

New York, USA Katılım Ocak 2019
591 Takip Edilen2.1K Takipçiler
Sabitlenmiş Tweet
Arthur
Arthur@itsArthurAI·
☁️ Arthur is now available in @googlecloud ! Many of our customers are building on Google Cloud and leveraging the latest Gemini and agent frameworks, so we partnered with Google to make Arthur available directly within your GCP environment. This means data never leaves your GCP environment, procurement is seamless through the Marketplace, and deployment fits naturally into your existing workflows and stack. With the explosion of agents, teams lose visibility into which agents are running and lack insight into failures. As enterprises race to adopt Agentic AI, a comprehensive agentic governance approach is crucial to preventing chaos, security nightmares, and business continuity issues. That’s why we launched Arthur’s Agent Discovery & Governance (ADG) Platform on Google Cloud. With Arthur on Google Cloud, you can: 🔍 Automate Discovery: Instantly find and catalog agents company-wide 📈 Unify Monitoring: Monitor and govern internally-developed and third-party agentic solutions 🛡️ Centralize Policy Management: Enforce acceptable use and security policies for all agent interactions 🔄 Continuously Evaluate: Monitor performance aligned specifically to agent tasks Read full announcement → arthur.ai/blog/arthur-la…
Arthur tweet media
English
0
2
3
516
Arthur
Arthur@itsArthurAI·
Most teams building agents skip the foundations, and it shows up later as breakage, risk, and stalled rollouts. Our Forward Deployed Engineering (FDE) team just wrapped a six-part series on what it actually takes to ship a reliable agent, and the order matters more than people think: 1️⃣ Observability & tracing — You can't manage what you can't see. 2️⃣ Prompt management — once you can see behavior, you need a safe way to change it. 3️⃣ Continuous evals — automated evals on live traffic, powered by the traces from step 1. 4️⃣ Experiments & supervised evals — validate changes against a fixed dataset before they ship. 5️⃣ Guardrails — intercept bad inputs and outputs in real time. 6️⃣ Discovery & governance — make the agent discoverable, auditable, and owned. Read more 👇 arthur.ai/blog/checklist…
English
0
0
1
36
Arthur
Arthur@itsArthurAI·
We've deployed production agents across dozens of enterprises. The same six gaps show up every time. Over the past few months, our Forward Deployed Engineering team published a six-part series distilling what it actually takes to get an AI agent production-ready. Here's your checklist: ✅ Observability & Tracing — Instrument every LLM call, tool invocation, and RAG retrieval. You can't fix what you can't see. ✅ Prompt Management — Store prompts externally, version them, and test changes before promoting. Hardcoded prompts break at scale. ✅ Continuous Evaluations — Run unsupervised evals on live traffic to catch failures before your users do. ✅ Experiments & Supervised Evals — Validate prompt, RAG, and agent changes against a fixed dataset before they ship. ✅ Guardrails — Intercept bad inputs before they reach the model and bad outputs before they reach the user. ✅ Discovery & Governance — Make your agent discoverable, auditable, and owned so it can clear enterprise review. Full recap + links to all six parts 👇 arthur.ai/blog/checklist…
English
0
0
0
52
Arthur
Arthur@itsArthurAI·
🔍 Manual compliance checks for agents don't scale. By the time you catch a policy violation, customers are already affected and your audit trail has gaps. That's the reality for most teams running AI agents in production. Compliance requirements are mounting, but the process is still spreadsheets, Slack reminders, and last-minute scrambles before audits. ⚡ Arthur's April platform release changes that equation entirely. Here's what's new: → Policy Management: create organizational policies with inline alert and attestation rules, assign them to models with a single API call, and watch compliance status update automatically → Automated Compliance Jobs: every model with assigned policies gets daily compliance verification. Violations trigger immediate webhook notifications to your incident response systems → Agent Governance at scale: track tool calls, API costs, and agent behavior across your entire fleet Read the full breakdown (link in comments) 👇
Arthur tweet media
English
1
0
0
120
Arthur
Arthur@itsArthurAI·
You trust Claude Code to read files, make edits, and fire off LLM calls autonomously. But do you actually know what it's doing? Each turn reads files, makes edits, runs searches, calls sub-agents, and fires off multiple LLM requests autonomously. But you don't see what's happening under the hood. Which tool failed and got retried? Did it read the right files before making that edit? What are you actually sharing with Anthropic? We built an open-source integration that changes that. It hooks into Claude Code's event system and sends structured OpenInference traces to Arthur Engine, giving you full visibility into every turn, every tool call, and every LLM request. Installs in 30 seconds. Works locally and in CI. Link in comments 👇
English
1
0
1
102
Arthur
Arthur@itsArthurAI·
Building a functioning agent isn't enough if it can't pass a governance review ✅ As agent adoption grows, organizations are losing track of what's running, what data agents can access, what tools they invoke, and who owns them. Governance teams are responding by requiring agents to meet specific standards before they're allowed to operate in production. Here's what designing for governance actually looks like: → Centralized telemetry — governance tooling discovers agents by finding their traces. No traces = invisible to the org → Thorough instrumentation — traces need to capture tools, subagents, LLM providers, and data sources so compliance teams can see the full scope → Evals and guardrails as evidence — the observability, evals, and guardrails aren't just operational best practices, they're your proof that the agent meets enterprise standards → Clear ownership — every agent needs an accountable owner and a defined scope Our FDE team sees this consistently: designing for governance from the start is the key to getting agents into production faster. Read Part 6 (link in comments) #AIAgents #AIGovernance #EnterpriseAI #LLMs #AgentDevelopment
Arthur tweet media
English
1
0
0
48
Arthur
Arthur@itsArthurAI·
"Building an agent" doesn't mean what most people think. Most people hear "AI agent" and picture an autonomous system making complex decisions on its own. The reality is much more practical and accessible. Our product manager, @Ashley_Nader, built an open-source AI-powered workflow called Louisa 🐶 that automatically generates polished, user-facing release notes every time a new release ships. She built Louisa using @claudeai Code, deployed it on @vercel, and integrated it with the Arthur Engine for observability and tracing. The real insight isn't the tool, it's the mindset. Building an “agent” starts with one question: What am I doing repeatedly that I could automate? You don't need to be an engineer. You don't need a massive framework. You just need a clear problem, a good prompt, and the observability to know when things go sideways. Louisa is open source (link in comments) 👇 #aiagents #ai
Arthur tweet media
English
3
1
2
125
Arthur
Arthur@itsArthurAI·
Claude Code is a black box. Every tool call, file read, and sub-agent invocation happens invisibly. You type a prompt, and Claude Code reads files, makes edits, calls sub-agents, and fires off multiple LLM requests autonomously. But which tool failed and got retried? Did it read the right files? Are you accidentally sharing API keys with Anthropic? We built an open-source integration that hooks into @claudeai Code's event system and sends structured OpenInference traces to Arthur Engine: → Every user prompt becomes a trace with spans for each action → Tool failures surface as error spans instead of silently disappearing → Track token usage and cost across your team Install in 30 seconds. Works locally and in CI pipelines. If you're using Claude Code for real engineering work, you should be able to see what it's doing. Full blog (link in comments).
Arthur tweet media
English
1
0
1
92
Arthur
Arthur@itsArthurAI·
A major airline we work with had a hard requirement: no customer PII from a customer support conversation should leave their corporate environment. The solution was a pre-LLM guardrail — every conversation passes through PII detection before anything is sent to the model. Sensitive data is automatically redacted. No manual review needed. That's one side of the guardrails equation. The other is what happens after the model responds. Another customer runs a post-LLM hallucination guardrail that catches unsupported claims and automatically feeds them back to the agent for self-correction — before the user ever sees the response. No human in the loop required. This is the distinction we break down in Part 5 of our Best Practices for Building Agents series: → Pre-LLM guardrails — intercept inputs before they leave your environment (PII redaction, sensitive data blocking, prompt injection detection) → Post-LLM guardrails — intercept outputs before they reach your user (hallucination detection, self-correction loops) Observability, prompt management, evals, and now guardrails each layer adds confidence. Together, they're what gets agents from demo to production. Read Part 5 here 👇: arthur.ai/blog/best-prac…
English
0
1
1
43
Arthur
Arthur@itsArthurAI·
Evals tell you when and why something went wrong. Guardrails stop it from happening in the first place. Part 5 of our Best Practices for Building Agents series is live covering guardrails, the real-time layer that intercepts bad inputs before they reach your LLM and bad outputs before they reach your user. There are two patterns every production agent needs: → Pre-LLM guardrails — PII redaction, sensitive data blocking, and prompt injection detection, all running before anything leaves your environment → Post-LLM guardrails — hallucination detection and self-correction loops that catch unsupported claims before the user ever sees them These aren't theoretical patterns. A major airline we work with uses pre-LLM guardrails to redact PII from customer support conversations before they ever hit an external model provider. Another customer runs a post-LLM hallucination guardrail that automatically feeds bad outputs back to the agent for correction. Our FDE team sees this consistently: the teams that ship agents to production with confidence aren't just monitoring after the fact. They're intercepting in real time. Read Part 5 (link in comments)
Arthur tweet media
English
1
0
3
62
Arthur
Arthur@itsArthurAI·
One of the first things that broke for us was inaccuracy in ticket priority, so the jirabot would prioritize tickets about bugs in the dev environment with "high" priority. Evals definitely helped us identify the issue and fix it. We wrote a blog about all the parts of the agent that broke, how we identified them, and fixed them! arthur.ai/blog/from-vibe…
English
0
0
1
19
Philippe Van Dyck
Philippe Van Dyck@pvdyck1·
@itsArthurAI the jump from "it works on my machine" to production-ready is where most agent projects die. curious what broke first
English
2
0
1
18
Arthur
Arthur@itsArthurAI·
We turned a vibe-coded Slack Jira bot into a production-ready AI agent in 2 weeks. Here's how. At Future of DevEx NYC, our Head of Platform Engineering Noriaki (Nori) Tatsumi walked through exactly how we bridged that gap — live on stage. The demo: a Slack Jira bot that started as a quick vibe-coded prototype. It worked… sometimes. The kind of "sometimes" that erodes user trust fast. So we applied the same continuous evaluation practices we deploy with customers every day: → Automated evals running against real Slack interactions, not just test cases → Iterating on prompts and agent behavior with experiment results, not based on vibes → Moving from "it seems to work" to "we can prove it works" If you're building agents that need to go beyond the vibe-coded agents that “work”, this talk is definitely worth your 15 mins👇 youtube.com/watch?v=jF5yvK…
YouTube video
YouTube
English
1
0
0
120
Arthur
Arthur@itsArthurAI·
Building agents with @mastra? Arthur is now an Observability Exporter in the Mastra AI agent framework ⚡ With this new exporter, you can add full observability on your Mastra agents. Traces flow straight into Arthur Engine via OpenTelemetry with zero coding. Build your agent in Mastra. Monitor and evaluate it in Arthur. Get started here → mastra.ai/docs/observabi… #aiagents #agentdevelopment #ai #mastra
Arthur tweet media
English
0
0
0
65
Arthur
Arthur@itsArthurAI·
Your agents are running in production, but can you actually see what they're doing, govern them at scale, and debug them without clicking through five different screens? Our March product updates brings it all together: unified navigation, enterprise policy management, Claude Code observability, a built-in Engine Assistant, and more. Read the full product updates here → arthur.ai/blog/platform-…
English
0
0
0
78
Arthur
Arthur@itsArthurAI·
Changing a prompt shouldn't require a deploy. Here's what it looks like when it doesn't. ⚡ As our engineer, Nori Tatsumi, spoke about at his recent talk, most teams start with hardcoded prompts. That works for demos. But in production, untracked changes, coupled deploy cycles, and zero rollback capability mean prompts are an operational risk. Arthur's Prompt Management system lets you manage prompts, specify the LLM provider and model, configure tools and make it all available to your agent in real time, without a single code change.
English
1
0
0
88