HumanFirst

120 posts

HumanFirst banner
HumanFirst

HumanFirst

@HumanFirst_ai

The Hub for Conversational AI Data.

Montréal, Québec انضم Mayıs 2019
545 يتبع295 المتابعون
HumanFirst
HumanFirst@HumanFirst_ai·
Language Model Cascading & Probabilistic Programming Language The term Language Model Cascading (LMC) was coined in July 2022, which seems like a lifetime ago considering the speed at which the LLM narrative arc develops… Read more here: humanfirst.ai/blog/language-…
HumanFirst tweet media
English
0
1
2
156
HumanFirst
HumanFirst@HumanFirst_ai·
ICYMI - Announcement: A Powerful Partnership: HumanFirst Teams Up with Google Cloud to Boost Data Productivity, Custom AI Prompts and Models. Read more here: humanfirst.ai/blog/a-powerfu…
HumanFirst tweet media
English
0
0
0
57
HumanFirst
HumanFirst@HumanFirst_ai·
RT @CobusGreylingZA: SmartLLMChain is a LangChain implementation of the self-critique chain principle. It is useful for particularly comple…
English
0
1
0
0
HumanFirst أُعيد تغريده
@·
It does seem that the future will be one where Generative Apps will become more model (LLM) agnostic and model migration will take place; with models becoming a utility. Blue oceans are turning into red oceans very fast; and a myriad of applications and products are at threat due to developments like the expansion of LangSmith. The ecosystem is still very nascent and major changes are bound to happen. It does seem that application developers do not want to offload functionality to LLMs, for a blackbox approach. And hence including the complexity rather in the prompt/pipeline phase instead of just leveraging LLM context windows and model capabilities. LangChain, together with Haystack, are taking the lead in how the Generative App landscape is unfolding; from a open-source perspective. It will not be surprising if LangSmith will sprawl into some form of application development (and not only application management) with Pipeline and Prompt Chaining options. A graphic approach to prompt testing, like ChainForge, will also make sense. Having a tool which assist with chunking data for vector store / semantic search / RAG implementations, being able to test and tweak chunks will be of great value. Added to the main categories of Use Cases, Type, Language and Models; could be a new category of prompt techniques. Which could consider techniques like, as shown in the article below. #LargeLanguageModels #PromptEngineering #ConversationalAI @langchain @hwchase17 cobusgreyling.medium.com/langchain-hub-…
 tweet media
English
3
6
25
6.8K
HumanFirst أُعيد تغريده
@·
A recent study found that when LLMs are presented with longer input, LLM performance is best when relevant content is at the start or end of the input context. Performance degrades when relevant information is in the middle of long context. A few days ago Haystack by deepset released a component which optimises the layout of selected documents in the LLM context window. The component is a way to work around the problem identified in the paper. Why I particularly like this implementation from Haystack, is the fact that it's a good example of how innovation in the pre-LLM functionality, or the pipeline phase, can remedy inherent vulnerabilities of a LLM. Thank you @tuanacelik for telling me about this functionality and technical assistance. 🙂 @deepset_ai Read more here: cobusgreyling.medium.com/ahaystack-deve…
 tweet media
English
0
4
7
553
HumanFirst أُعيد تغريده
@·
Large Language Models (LLMs) are known to hallucinate. Hallucination is when a LLM generates a highly succinct and highly plausible answer; but factually incorrect. Hallucination can be negated by injecting prompts with contextually relevant data which the LLM can reference. Growing LLM context size has the allure that large swaths of contextual reference data can merely be submitted to the LLM to act as reference data. Reference data which will create a contextual reference for the LLM and in turn negate hallucination… A recent study found that LLMs perform better when the relevant information is located at the beginning or end of the input context. However, when relevant context is in the middle of longer contexts, the retrieval performance is degraded considerably. This is also the case for models specifically designed for long contexts. Read more in the article below. #LargeLanguageModels #PromptEngineering #LLMs cobusgreyling.medium.com/does-submittin…
 tweet media
English
1
1
7
490
HumanFirst أُعيد تغريده
@·
This article considers how Ragas can be combined with LangSmith for more detailed insights into how Ragas goes about evaluating a RAG/LLM implementation. Currently Ragas makes use of OpenAI, but it would make sense for Ragas to become more LLM agnostic; And Ragas is based on LangChain. The LangSmith integration lifts the veil on cost and latency running Ragas. For this assessment based implementation, latency might not be that crucial, but cost can be a challenge. In short, what is Ragas? Ragas is a framework which can be used to test and evaluate RAG implementations. As seen in the image below, the RAG evaluation process divides the assessment into to categories, Generation and Retrieval. Generation is again divided into two metrics, faithfulness and relevance. Retrieval can be divided into Context Precision and Context Recall. Link to the full article: cobusgreyling.medium.com/combining-raga… #LangChain #LargeLanguageModels #PromptEngineering @langchain @hwchase17
 tweet media
English
0
6
39
18.9K
HumanFirst
HumanFirst@HumanFirst_ai·
In this article I consider the growing context of various Large Language Models (LLMs) to what extent it can be used and how a principle like RAG applies. Read more here: humanfirst.ai/blog/rag-llm-c…
HumanFirst tweet media
English
0
1
1
98
HumanFirst أُعيد تغريده
@·
Steps In Evaluating Retrieval Augmented Generation (RAG) Pipelines - The basic principle of RAG is to leverage external data sources. For each user query or question, a contextual chunk of text is retrieved to inject into the prompt. This chunk of text is retrieved based on its semantic similarity with the user question. But how can a RAG implementation be tested and benchmarked over time? Input: Question:  - These are the questions the RAG pipeline will be evaluated on. Answer:  - The answer generated from the RAG pipeline which will be presented to the user. Contexts:  - The contexts which will be passed into the LLM to answer the question. Ground Truths:  - The ground truth answer to the questions. Output: Faithfulness, Answer Relevancy, Context Relevancy and Context Recall. Link to the full article: @cobusgreyling/steps-in-evaluating-retrieval-augmented-generation-rag-pipelines-7d4b393e62b3" target="_blank" rel="nofollow noopener">medium.com/@cobusgreyling#LargeLanguageModels #PromptEngineering #RAG
 tweet media
English
1
2
5
437
HumanFirst أُعيد تغريده
@·
 tweet media
ZXX
0
1
1
71
HumanFirst أُعيد تغريده
@·
How to Mitigate LLM Hallucination and Single LLM Vendor Dependancy (Link to the full article in the comments) Four years ago I wrote about the importance of context when developing a chatbot. Context is more relevant now with LLMs than ever before. Injecting prompt with a contextual reference (RAG) at inference does two things. (1) It negates hallucination, (2) it vastly improves the performance of LLMs. I have read many comments that the models hosted on HuggingFace is just not as accurate and coherent as OpenAI's models...but have you tried interacting with these model with a contextual reference? I believe a contextual reference greatly enhances the performance of LLMs. In the two examples below the model google/flan-t5-xxl, hosted by HuggingFace, is used in both instances on the right and left. On the left, no RAG or contextual reference approach is taken. However, on the right the prompt is given an a correct answer is given by the LLM. #LargeLanguageModels #promptEngineering #LLMs cobusgreyling.medium.com/mitigating-hal…
 tweet media
English
1
1
2
331
HumanFirst أُعيد تغريده
@·
The graph below graphically illustrates how the accuracy improves at the beginning and end of the information entered. And the performance deprecation when referencing data in the middle is also visible.
 tweet media
English
0
1
1
69