Braintrust

19 posts

Braintrust banner

Braintrust

Braintrust

@braintrustdata_

The end-to-end platform for building world-class AI apps.

Software Company Katılım Ocak 2025

0 Takip Edilen0 Takipçiler

Braintrust

Braintrust@braintrustdata_·21 Şub

If you use the Braintrust AI proxy, you'll be able to switch your production model to @grok 3 as soon as it's available via API with one line of code. Just make sure you run some evals first to check if it's the right model for your app. braintrust.dev/blog/new-model

English

0

0

53

Braintrust

Braintrust@braintrustdata_·21 Şub

New cookbook: Evaluating video QA LLMs are great at interpreting text, but understanding and providing reliable answers to videos is still a challenge. We set up a workflow for evaluating video QA performance, which you can adapt to different use cases. braintrust.dev/docs/cookbook/…

English

0

0

33

Braintrust

Braintrust@braintrustdata_·10 Şub

We had a blast at @southpkcommons talking about AI agents: - How to evaluate them - Best practices when building them - Sharing insights from customers like @zapier Loved the energy from all the founders building at SPC. Thanks for hosting us!

Braintrust tweet media

English

0

0

17

Braintrust

Braintrust@braintrustdata_·31 Oca

Hello everyone, We have been in the talks of making a tokenized braintrust. It has finally come to fruition. We will be launching braintrust today on @pumpdotfun. Stay vigilant, do not buy fakes. Only follow posts from this account.

English

0

0

8

Braintrust

Braintrust@braintrustdata_·31 Oca

Our playground now supports all the new OpenAI models and some fun OS models 😎 thanks to @perplexity_ai

English

0

0

8

Braintrust

Braintrust@braintrustdata_·31 Oca

Don't have an eval set already? Tired of writing scoring functions? Our `autoevals` library makes it easy to grade your LLM outputs. It includes prebuilt scoring functions: • Model-based (using LLMs) • Heuristic (e.g. Levenshtein distance) • Statistical (e.g. BLEU)

Braintrust tweet media

English

0

0

6

Braintrust

Braintrust@braintrustdata_·31 Oca

And it's easy to read from a dataset from your evaluation script or backend services. And it's easy to read from a dataset from your evaluation script or backend services.

Braintrust tweet media

English

0

0

1

Braintrust

Braintrust@braintrustdata_·31 Oca

It's so easy to manage test sets and datasets with Braintrust. We made a web UI for editing evals with your team so you don't need to make your own with Google Sheets/Retool. Our TS/Python library also...

Braintrust tweet media

English

0

0

3

Braintrust

Braintrust@braintrustdata_·31 Oca

Braintrust also easily integrates with Pytest, Jest, etc. What other testing libraries do you like to use? 😎 We've just made our experiment sidebar resizable. Now, you can quickly view what you need without having to change pages all the time.

Braintrust tweet media

English

0

0

4

Braintrust

Braintrust@braintrustdata_·31 Oca

⏰ We added duration stats to experiments! See which test cases were faster or took longer. There's a tradeoff between speed <> quality. Use Braintrust to help you find the optimal balance 😇.

Braintrust tweet media

English

0

0

2

Braintrust

Braintrust@braintrustdata_·31 Oca

⏰ We added duration stats to experiments! See which test cases were faster or took longer. There's a tradeoff between speed <> quality. Use Braintrust to help you find the optimal balance 😇.

Braintrust tweet media

English

0

0

1

Braintrust

Braintrust@braintrustdata_·31 Oca

😍 It's now so easy to use variables in our Playground. We got tired of editing raw JSON so we upgraded our UI to support variable/object inputs better.

English

0

0

1

Braintrust

Braintrust@braintrustdata_·31 Oca

Spend your time building the fun parts of AI apps w/ Braintrust :)

Braintrust tweet media

English

0

0

1

Braintrust

Braintrust@braintrustdata_·31 Oca

🤩 New feature: text blocks in the playground! These blocks just return a constant or variable value without any LLM call. This makes it easy to: - debug your prompts - mock API responses and vectorDB calls

English

0

0

1

Braintrust

Braintrust@braintrustdata_·31 Oca

Don't get stuck manually inputting test cases into your LLM app after every prompt change. Braintrust makes it easy to automatically evaluate and test your LLM apps.

Braintrust tweet media

English

0

0

1

Braintrust

Braintrust@braintrustdata_·31 Oca

finetuned-bison: braintrust.dev/app/braintrust…

Eesti

0

0

2

Braintrust

Braintrust@braintrustdata_·31 Oca

gpt-3.5: braintrust.dev/app/braintrust…

0

0

3

Braintrust

Braintrust@braintrustdata_·31 Oca

finetuned-gpt-3.5: braintrust.dev/app/braintrust…

English

0

0

3

Braintrust

Braintrust@braintrustdata_·31 Oca

We evaluated Google's text-bison LLM against OpenAI's gpt-3.5-turbo on a SQL generation task in Braintrust. Here's how they performed: - finetuned-gpt3.5: 92.4% - finetuned-bison: 84.2% - gpt3.5: 78.7% - bison: 74.8% (We finetuned both models too!) Dig into the evals below:

English

0

0

9

Keşfet

@grok @southpkcommons @zapier @perplexity_ai @elonmusk @BarackObama @taylorswift13 @cristiano