Sabitlenmiş Tweet
Braintrust
672 posts

Braintrust
@braintrust
The observability layer for production AI.
Katılım Ağustos 2023
54 Takip Edilen6.6K Takipçiler

At @vercel, customers expect to build with the latest models immediately, so they ship support within hours of release.
Braintrust gives their team a structured way to benchmark new models against existing ones, catch performance differences early, and deploy with confidence.
English

Evals course module twelve: online scoring.
Learn how to run online scoring against production logs as they arrive, so you get continuous quality monitoring without manual intervention.
More here → braintrustdata.link/evals-course-1…
English

Evals course module eleven: analyzing multi-turn traces.
Per-turn scorers catch individual response issues. Trace-level scorers catch conversation-wide failures like lost context or unresolved issues. Lean how to run both together.
More here → x.com/braintrust/sta…
Braintrust@braintrust
English

Evals 101: a new course from Braintrust. Everything you need to know about evals, and how to do them yourself.
Module one: Why are evals important?
- the six most common problems developers face when shipping AI applications
- why traditional software thinking doesn't apply to AI
- how evals can fix these problems
English

Encyclopedia Evalica
A resource from Braintrust compiling the most important things to know about evals. The terms to learn, the principles to apply, and Braintrust's take on why evals matter.
Read more → braintrustdata.link/encyclopedia-e…
English

An eval platform is more than just a test runner. Evals require shared definitions of "good," reliable data pipelines, labelling workflows, versioning, and trust in results across many teams and model changes.
Hear about the design principles behind Braintrust's platform in this session from @aidotengineer.
English

For AI PMs, evals are the new PRD.
At @PLEDalliance Summit New York, Ameya Bhatawdekar discussed the new product development loop and how to translate every element of a traditional PRD into its eval equivalent.

English

