Braintrust

810 posts

Braintrust

@braintrust

The observability layer for production AI.

Katılım Ağustos 2023

56 Takip Edilen7K Takipçiler

Sabitlenmiş Tweet

Braintrust@braintrust·1 Haz

Topics is now GA on all plans. Continuously find the patterns worth investigating across your production traffic.

English

Braintrust@braintrust·2d

Learn more → braintrustdata.link/discover-topics

English

197

Braintrust@braintrust·2d

Topics has a dedicated page that shows all clusters generated from your production logs. Compare past and present groupings to understand how user behavior evolves over time and gain visibility into how Topics categorizes conversations, complaints, and feature requests.

English

335

Braintrust retweetledi

Izzy Hurley@iz_hurley_·2d

I’ve seen a lot of confusion about how to best leverage the new family of GPT 5.6 models, and honestly, sameeee! Running this eval helped me pin down which models I should use (or have an agent use) to build effective code without burning extra tokens and dollars.

Braintrust@braintrust

We evaled the GPT-5.6 family, plus Anthropic's Fable, Opus 4.8, and Sonnet 5, on the key building blocks of agentic workflows. Then we broke results down by task type and difficulty and turned them into a decision map you can route against. Here's what we found.

English

2.4K

Braintrust@braintrust·2d

English

159

Braintrust@braintrust·2d

The family-level results show data transforms are nearly solved for the OpenAI models, symbolic rules pull them apart, and the Anthropic rows are dragged down by refusals rather than by wrong answers.

English

179

Braintrust@braintrust·2d

English

1.7K

Braintrust@braintrust·3d

English

139

Braintrust@braintrust·3d

If you're building a voice agent, picking the right speech-to-text model is not obvious. Every provider claims to be accurate, fast, and production-ready, but the benchmarks they publish rarely look like actual traffic. We used Braintrust to build a controlled eval across six STT providers, 240 audio cases, and eight content domains, scoring not just transcription accuracy but whether errors changed the downstream LLM answer.

English

693

Braintrust@braintrust·3d

See you there → luma.com/ai_workshop_sf

English

240

Braintrust@braintrust·3d

Your agent is in production. Now you need to understand what it is doing, where it is failing, and what to improve next. Join our live workshop to build a repeatable workflow for turning agent behavior into evals, measuring improvements, and catching regressions. Then hear from the team at @meetgranola about how they use observability to build the agents that power their product.

English

359

Braintrust@braintrust·4d

English

140

Braintrust@braintrust·4d

The Braintrust Go SDK enables automatic instrumentation with no code changes using Orchestrion, or the use of manual middleware for explicit control. Supports OpenAI, Anthropic, Google Gemini, Google ADK, and more.

English

463

Braintrust@braintrust·4d

English

225

Braintrust@braintrust·4d

Phrase search breaks when every word is common but the exact sequence is rare. At 100TB+ of agent traces, queries time out and traditional databases fail. Brainstore uses shingled bloom filters, indexing trigrams instead of tokens. On a 290 GB dataset, 98.5% of segments were eliminated, and a correct response was returned after scanning only 4GB, instead of timing out after 100GB.

English

6.7K

Braintrust@braintrust·5d

Join us → braintrustdata.link/llm-cost-works…

English

311

Braintrust@braintrust·5d

Switching to a cheaper model can cut token costs, but if failures increase and retries pile up, the savings disappear. Learn how to run eval experiments to compare models objectively so you can reduce costs without compromising agent quality.

English

445

Braintrust retweetledi

Jess@daRubberDuckiee·5d

In January, I joined Braintrust, told my manager @morgane_paloma that I was huge into pickleball, and 6 months later she threw a pickleball event just for me ❤️ Okay maybe not just for me, but the rest is true! Had an amazing time last week in SF at The Agent Open hosted by @braintrust. Here's a little recap of the experience:

English

4.7K

Keşfet

@meetgranola @morgane_paloma @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA