Arize AI

1.4K posts

Arize AI banner
Arize AI

Arize AI

@arizeai

The AI engineering platform for teams shipping reliable AI agents and LLM applications. Also home to @ArizePhoenix.

San Francisco, CA شامل ہوئے Ocak 2020
126 فالونگ4.4K فالوورز
Arize AI
Arize AI@arizeai·
The agent harness you wrote last year was implicitly tuned for a model that doesn't quite exist anymore. Models shift while we're not looking. Relying on vibes means customers find out before you do. @rachelnabors shares the data and a forkable repo to test your own loop: x.com/rachelnabors/s…
English
0
0
0
42
Arize AI
Arize AI@arizeai·
One AI Question with @jimbobbennett What's your 🌶️ take on AI? Our DevEx Engineer's take: Start with the mindset that AI sucks—so you're forced to build the evals and observability to make it great. Don't trust it. Test it. #AI #Programming #SoftwareDevelopment
English
0
0
1
65
Arize AI
Arize AI@arizeai·
If a prompt change can alter tool use, routing, or output without touching your code, it isn’t just text. It’s runtime behavior. That’s when prompts need their own lifecycle: versioning, rollout, rollback, and observability. This is the decision gate we use:
Arize AI tweet media
English
1
0
0
52
Arize AI
Arize AI@arizeai·
Agents today are running longer sessions, making more decisions, and touching more systems. That makes knowing if they're doing the right thing critical. Thanks again @furrier and team for having us. ✨
English
0
0
0
38
Arize AI
Arize AI@arizeai·
We ran 500 evals to test the "MCP is dead, long live the CLI" claim and presented the results at AI Engineer: Miami. The answer is more interesting than a Twitter fight! Correctness was tied (~82%). But on the hardest analytical tasks, MCP cost 6× more and ran 5× longer than CLI-via-skills. Sometimes MCP was able to one-shot things and beat the CLI, but more often the MCP needed to use the CLI itself to complete a task. Plot twist: a test with NO skills, no MCP, actually did better than MCP and some skills. The real conclusion: MCP vs. CLI is the wrong question. CLI for local, popular, composable, dev-only. MCP for remote, OAuth, proprietary, consumer. Real agents use both. Check out the full talk here: youtu.be/CfITzVcUkZA
YouTube video
YouTube
English
0
1
2
291
Arize AI
Arize AI@arizeai·
Agent traces aren't telemetry. They aren't debugging exhaust. They're the first compounding data loop enterprise software has ever had — and you should make sure you own them. Read the full blog post: arize.com/blog/using-con…
Arize AI tweet media
English
0
0
1
104
Arize AI
Arize AI@arizeai·
The TLDR from @aparnadhinak? Bigger context windows help. But reliable agents need a harness that decides what stays close, what gets compressed, what gets evicted, and what can be retrieved later. Read more: arize.com/blog/context-m…
English
0
0
1
89
Arize AI
Arize AI@arizeai·
Across Pi, OpenClaw, Claude Code, Letta, and Arize’s Alyx, the same techniques keep showing up: • Cap large file reads • Use offset and limit pagination • Budget tool results • Compact older history into summaries • Isolate subagents from parent sessions
Arize AI tweet media
English
1
0
0
108
Arize AI
Arize AI@arizeai·
Long-running agents don't just need bigger context windows. They need better context management. But context always fills up with more than the task: file reads, tool outputs, stale turns, subagent responses, memory summaries, and repeated previews.
Arize AI tweet media
English
1
0
5
202
Arize AI
Arize AI@arizeai·
GPT 5.5 and 5.5 Pro are now live in the @OpenAI API and available in the Arize AX prompt playground! Find out how frontier intelligence improves your agents in seconds!
Arize AI tweet media
English
0
0
2
325
Arize AI
Arize AI@arizeai·
The short version: a harness is the operating layer that turns a model from something that responds into something that can act, observe, adjust, and keep going. Cursor, Claude Code, Windsurf, Codex, and our agent Alyx are all converging on the same pattern. That’s the signal. Get the full breakdown: arize.com/?p=28084&previ…
English
0
0
1
217
Arize AI
Arize AI@arizeai·
“If you’re not the model, you’re the harness” sounds clever. It’s also wrong. 👀 A harness isn’t everything around an LLM. It’s a specific architecture that’s showing up in systems that actually work. Our cofounder @aparnadhinak wrote the clearest breakdown we've seen yet.
Arize AI tweet media
English
2
1
3
273
Arize AI
Arize AI@arizeai·
What actually makes an AI agent work in production? Hint: it's not just the model. In an interview, Tobias Leong, CTO and cofounder of Axium Industries, talked through what teams learn once agents leave the demo environment and hit real workflows: missing context, messy source systems, weak evals, and the need to separate retrieval from reasoning.
English
2
1
4
216