PareaAI

242 posts

PareaAI

PareaAI

@PareaAI

Parea AI (YC S23) provides tools for evaluating, testing and monitoring LLM applications.

New York, USA शामिल हुए Nisan 2023
33 फ़ॉलोइंग246 फ़ॉलोवर्स
PareaAI रीट्वीट किया
Tom Dörr
Tom Dörr@tom_doerr·
.@PareaAI also looks like a good LLM monitoring tool and is open source
Tom Dörr tweet media
English
2
4
21
2.2K
PareaAI रीट्वीट किया
Joschka Braun
Joschka Braun@JoschkaBraun·
How do you detect unreliable behavior of your LLM app? Recently, we talked to the team at @sixfoldai and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using @PareaAI. More about how they test their risk assessment AI solution for insurance underwriters in the article in the thread
Joschka Braun tweet media
English
1
3
11
845
PareaAI रीट्वीट किया
Joschka Braun
Joschka Braun@JoschkaBraun·
Saturdays are for doc upgrades
Joschka Braun tweet media
English
0
1
6
538
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
🚀 New deep dive notebook on @PareaAI experiments and LLM evals 📝🔬. I cover some of the key functionalities illustrating the power and flexibility of our API. 🔽 Link in comments 🔽
English
1
3
5
477
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
@cohere 's actually pretty awesome. More folks should be exploring their models. @PareaAI , now has auto-instrumentation for the Cohere py sdk 🚀
Joel Alexander tweet mediaJoel Alexander tweet media
English
2
3
6
376
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
There are so many “black box” evals that force users to instantiate eval classes. Never fully understood this. At @PareaAI we see evals as just functions. You can copy the source code and modify as you see fit, all OSS and based on latest research. Check these out👇🏾
English
2
2
1
150
PareaAI रीट्वीट किया
Joschka Braun
Joschka Braun@JoschkaBraun·
Day 1 support for llama 3.1 via @FireworksAI_HQ in @PareaAI's playground! 🧨🦙
Joschka Braun tweet media
English
0
2
7
278
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
With the latest @GroqInc models for tool calling, we figured it was time to make Groq available across @PareaAI's playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!
Joel Alexander tweet media
English
0
2
4
165
PareaAI रीट्वीट किया
Joschka Braun
Joschka Braun@JoschkaBraun·
📝 Updated self-deployment docs ⭐️ Deploy @PareaAI on-prem via @Docker in 4 steps: 1. Clone the repo 2. Specify organization slug 3. Pull docker images & run them 4. Point SDK backend URL to self-deployed backend URL 🔗 -> 🧵
English
1
1
3
234
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
There have been so many new models lately. Most recently, @MistralAI 's codestral-mamba. I figured it'd be great to highlight how to use @PareaAI for Regression Testing. Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇
Joel Alexander tweet media
English
2
3
6
230
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
At this point I could probably have an llm monitor the top foundation model providers and then produce a PR for me that adds any new models to @PareaAI the moment they launch.
English
0
1
3
90
PareaAI रीट्वीट किया
Joschka Braun
Joschka Braun@JoschkaBraun·
If you use structured outputs with Instructor, track validation errors instantly with @PareaAI. Concretely, the integration automatically: - groups any LLM call due to retries together under a single trace - tracks any field which failed validation with the respective error message - visualizes validation error count over time Instrument calls made via the Instructor client by adding two lines: p = Parea(api_key="PAREA_API_KEY") p.wrap_openai_client(client, "instructor") Read the full blog post on the instructor docs in the 🧵
Joschka Braun tweet mediaJoschka Braun tweet mediaJoschka Braun tweet media
English
2
6
17
2.2K
PareaAI रीट्वीट किया
Joel Alexander
Joel Alexander@joel_a_wilde·
Moving from demos to production-ready LLM apps can be challenging. In this post, I outline a practical workflow to help teams make this transition, focusing on: - Hypothesis testing - Dataset creation - Effective evals - Experimentation Full post here: zurl.co/27Ad
English
0
2
5
185