Braintrust

819 posts

Braintrust

@braintrust

The observability layer for production AI.

Katılım Ağustos 2023

57 Takip Edilen7.1K Takipçiler

Sabitlenmiş Tweet

Braintrust@braintrust·1 Haz

Topics is now GA on all plans. Continuously find the patterns worth investigating across your production traffic.

English

4.1K

Braintrust@braintrust·6h

@AICouncilConf Watch the full session → youtu.be/Dn3_H2zcvPI?si…

YouTube

English

202

Braintrust@braintrust·6h

Evals are the foundation of shipping quality agents, but most engineers don't know where to start. This session from @aicouncilconf covers what evals are, why they matter, and how to build them into your development workflow.

English

177

Braintrust@braintrust·10h

Check out the SDK → braintrustdata.link/ruby-sdk-brain…

English

124

Braintrust@braintrust·10h

Use the Braintrust Ruby library to access the full Braintrust REST API from any Ruby 3.0+ application. Includes automatic retries, idiomatic error handling, and simple API key configuration. Install via Bundler to get started.

English

200

Braintrust@braintrust·1d

English

173

Braintrust@braintrust·1d

Early-stage teams building agents know they need evals and observability. But the cost and time commitment often push it down the priority list. Braintrust for Startups gives early-stage companies access to the same platform that Lovable, Browserbase, and Greptile run in production, alongside office hours and invites to Braintrust events.

English

727

Braintrust@braintrust·1d

@brennanbutler_ Hi Brennan. Thanks for choosing Braintrust. Please have some swag as a token of our appreciation: Link: braintrustdata.link/Brennan-Butler… code: Brennan-Butler-thank-you

English

Brennan Butler@brennanbutler_·2d

@ankrgyl we just sub'd to pro for the year :) Thank you @ankrgyl . the platform is awesome

English

Brennan Butler@brennanbutler_·8 Tem

Does anyone have a favorite tool for agent evals they could recommend? Looking at Langfuse but not sure if something better may exist. Braintrust looks great but $$$$$$ expensive?

English

Braintrust@braintrust·1d

English

102

Braintrust@braintrust·1d

Use the Braintrust .NET SDK for tracing and evaling AI in C#. Install the core package plus provider integrations for OpenAI, Anthropic, or the Microsoft Agent Framework. Run evals with custom test cases and scoring functions, or add automatic instrumentation via OpenTelemetry.

English

268

Braintrust@braintrust·4d

Learn more → braintrustdata.link/discover-topics

English

223

Braintrust@braintrust·4d

Topics has a dedicated page that shows all clusters generated from your production logs. Compare past and present groupings to understand how user behavior evolves over time and gain visibility into how Topics categorizes conversations, complaints, and feature requests.

English

371

Braintrust retweetledi

Izzy Hurley@iz_hurley_·4d

I’ve seen a lot of confusion about how to best leverage the new family of GPT 5.6 models, and honestly, sameeee! Running this eval helped me pin down which models I should use (or have an agent use) to build effective code without burning extra tokens and dollars.

Braintrust@braintrust

We evaled the GPT-5.6 family, plus Anthropic's Fable, Opus 4.8, and Sonnet 5, on the key building blocks of agentic workflows. Then we broke results down by task type and difficulty and turned them into a decision map you can route against. Here's what we found.

English

2.5K

Braintrust@braintrust·4d

English

171

Braintrust@braintrust·4d

The family-level results show data transforms are nearly solved for the OpenAI models, symbolic rules pull them apart, and the Anthropic rows are dragged down by refusals rather than by wrong answers.

English

195

Braintrust@braintrust·4d

English

1.8K

Braintrust@braintrust·5d

English

152

Braintrust@braintrust·5d

If you're building a voice agent, picking the right speech-to-text model is not obvious. Every provider claims to be accurate, fast, and production-ready, but the benchmarks they publish rarely look like actual traffic. We used Braintrust to build a controlled eval across six STT providers, 240 audio cases, and eight content domains, scoring not just transcription accuracy but whether errors changed the downstream LLM answer.

English

712

Keşfet

@AICouncilConf @brennanbutler_ @ankrgyl @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates