PareaAI

242 posts

PareaAI

PareaAI

@PareaAI

Parea AI (YC S23) provides tools for evaluating, testing and monitoring LLM applications.

New York, USA Beigetreten Nisan 2023
33 Folgt246 Follower
PareaAI retweetet
Tom Dörr
Tom Dörr@tom_doerr·
.@PareaAI also looks like a good LLM monitoring tool and is open source
Tom Dörr tweet media
English
2
4
21
2.2K
PareaAI retweetet
Joschka Braun
Joschka Braun@JoschkaBraun·
How do you detect unreliable behavior of your LLM app? Recently, we talked to the team at @sixfoldai and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using @PareaAI. More about how they test their risk assessment AI solution for insurance underwriters in the article in the thread
Joschka Braun tweet media
English
1
3
11
845
PareaAI retweetet
Joschka Braun
Joschka Braun@JoschkaBraun·
Saturdays are for doc upgrades
Joschka Braun tweet media
English
0
1
6
538
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
🚀 New deep dive notebook on @PareaAI experiments and LLM evals 📝🔬. I cover some of the key functionalities illustrating the power and flexibility of our API. 🔽 Link in comments 🔽
English
1
3
5
477
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
@cohere 's actually pretty awesome. More folks should be exploring their models. @PareaAI , now has auto-instrumentation for the Cohere py sdk 🚀
Joel Alexander tweet mediaJoel Alexander tweet media
English
2
3
6
376
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
There are so many “black box” evals that force users to instantiate eval classes. Never fully understood this. At @PareaAI we see evals as just functions. You can copy the source code and modify as you see fit, all OSS and based on latest research. Check these out👇🏾
English
2
2
1
150
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
With the latest @GroqInc models for tool calling, we figured it was time to make Groq available across @PareaAI's playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!
Joel Alexander tweet media
English
0
2
4
165
PareaAI retweetet
Joschka Braun
Joschka Braun@JoschkaBraun·
📝 Updated self-deployment docs ⭐️ Deploy @PareaAI on-prem via @Docker in 4 steps: 1. Clone the repo 2. Specify organization slug 3. Pull docker images & run them 4. Point SDK backend URL to self-deployed backend URL 🔗 -> 🧵
English
1
1
3
234
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
There have been so many new models lately. Most recently, @MistralAI 's codestral-mamba. I figured it'd be great to highlight how to use @PareaAI for Regression Testing. Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇
Joel Alexander tweet media
English
2
3
6
230
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
At this point I could probably have an llm monitor the top foundation model providers and then produce a PR for me that adds any new models to @PareaAI the moment they launch.
English
0
1
3
90
PareaAI retweetet
Joschka Braun
Joschka Braun@JoschkaBraun·
If you use structured outputs with Instructor, track validation errors instantly with @PareaAI. Concretely, the integration automatically: - groups any LLM call due to retries together under a single trace - tracks any field which failed validation with the respective error message - visualizes validation error count over time Instrument calls made via the Instructor client by adding two lines: p = Parea(api_key="PAREA_API_KEY") p.wrap_openai_client(client, "instructor") Read the full blog post on the instructor docs in the 🧵
Joschka Braun tweet mediaJoschka Braun tweet mediaJoschka Braun tweet media
English
2
6
17
2.2K
PareaAI retweetet
Joel Alexander
Joel Alexander@joel_a_wilde·
Moving from demos to production-ready LLM apps can be challenging. In this post, I outline a practical workflow to help teams make this transition, focusing on: - Hypothesis testing - Dataset creation - Effective evals - Experimentation Full post here: zurl.co/27Ad
English
0
2
5
185