Braintrust

19 posts

Braintrust banner
Braintrust

Braintrust

@braintrustdata_

The end-to-end platform for building world-class AI apps.

Software Company Katılım Ocak 2025
0 Takip Edilen0 Takipçiler
Braintrust
Braintrust@braintrustdata_·
If you use the Braintrust AI proxy, you'll be able to switch your production model to @grok 3 as soon as it's available via API with one line of code. Just make sure you run some evals first to check if it's the right model for your app. braintrust.dev/blog/new-model
English
0
0
0
53
Braintrust
Braintrust@braintrustdata_·
New cookbook: Evaluating video QA LLMs are great at interpreting text, but understanding and providing reliable answers to videos is still a challenge. We set up a workflow for evaluating video QA performance, which you can adapt to different use cases. braintrust.dev/docs/cookbook/…
English
0
0
0
33
Braintrust
Braintrust@braintrustdata_·
We had a blast at @southpkcommons talking about AI agents: - How to evaluate them - Best practices when building them - Sharing insights from customers like @zapier Loved the energy from all the founders building at SPC. Thanks for hosting us!
Braintrust tweet media
English
0
0
0
17
Braintrust
Braintrust@braintrustdata_·
Hello everyone, We have been in the talks of making a tokenized braintrust. It has finally come to fruition. We will be launching braintrust today on @pumpdotfun. Stay vigilant, do not buy fakes. Only follow posts from this account.
English
0
0
0
8
Braintrust
Braintrust@braintrustdata_·
Our playground now supports all the new OpenAI models and some fun OS models 😎 thanks to @perplexity_ai
English
0
0
0
8
Braintrust
Braintrust@braintrustdata_·
Don't have an eval set already? Tired of writing scoring functions? Our `autoevals` library makes it easy to grade your LLM outputs. It includes prebuilt scoring functions: • Model-based (using LLMs) • Heuristic (e.g. Levenshtein distance) • Statistical (e.g. BLEU)
Braintrust tweet media
English
0
0
0
6
Braintrust
Braintrust@braintrustdata_·
And it's easy to read from a dataset from your evaluation script or backend services. And it's easy to read from a dataset from your evaluation script or backend services.
Braintrust tweet media
English
0
0
0
1
Braintrust
Braintrust@braintrustdata_·
It's so easy to manage test sets and datasets with Braintrust. We made a web UI for editing evals with your team so you don't need to make your own with Google Sheets/Retool. Our TS/Python library also...
Braintrust tweet media
English
0
0
0
3
Braintrust
Braintrust@braintrustdata_·
Braintrust also easily integrates with Pytest, Jest, etc. What other testing libraries do you like to use? 😎 We've just made our experiment sidebar resizable. Now, you can quickly view what you need without having to change pages all the time.
Braintrust tweet media
English
0
0
0
4
Braintrust
Braintrust@braintrustdata_·
⏰ We added duration stats to experiments! See which test cases were faster or took longer. There's a tradeoff between speed <> quality. Use Braintrust to help you find the optimal balance 😇.
Braintrust tweet media
English
0
0
0
2
Braintrust
Braintrust@braintrustdata_·
⏰ We added duration stats to experiments! See which test cases were faster or took longer. There's a tradeoff between speed <> quality. Use Braintrust to help you find the optimal balance 😇.
Braintrust tweet media
English
0
0
0
1
Braintrust
Braintrust@braintrustdata_·
😍 It's now so easy to use variables in our Playground. We got tired of editing raw JSON so we upgraded our UI to support variable/object inputs better.
English
0
0
0
1
Braintrust
Braintrust@braintrustdata_·
Spend your time building the fun parts of AI apps w/ Braintrust :)
Braintrust tweet media
English
0
0
0
1
Braintrust
Braintrust@braintrustdata_·
🤩 New feature: text blocks in the playground! These blocks just return a constant or variable value without any LLM call. This makes it easy to: - debug your prompts - mock API responses and vectorDB calls
English
0
0
0
1
Braintrust
Braintrust@braintrustdata_·
Don't get stuck manually inputting test cases into your LLM app after every prompt change. Braintrust makes it easy to automatically evaluate and test your LLM apps.
Braintrust tweet media
English
0
0
0
1
Braintrust
Braintrust@braintrustdata_·
We evaluated Google's text-bison LLM against OpenAI's gpt-3.5-turbo on a SQL generation task in Braintrust. Here's how they performed: - finetuned-gpt3.5: 92.4% - finetuned-bison: 84.2% - gpt3.5: 78.7% - bison: 74.8% (We finetuned both models too!) Dig into the evals below:
English
0
0
0
9