adporn.net Architecture of a RAG app - Python Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Hands-On AI: Introduction to Retrieval-Augmented Generation (RAG)

Architecture of a RAG app - Python Tutorial

From the course: Hands-On AI: Introduction to Retrieval-Augmented Generation (RAG)

Start my 1-month free trial Buy for my team

Architecture of a RAG app

“

- [Instructor] Let's start with the definition. What is RAG? RAG stands for Retrieval-augmented Generation. The basic principle behind this technique is to give context to your LLMs so they can answer questions better. There are four main pieces of a RAG application. First, the language model. Usually this refers to a large language model, but people today are now developing smaller language models that may be able to do the same job. Second, the embedding model. This is what transforms your data into vectors. Third, the vector database. This is where you store your vectorized data. Optionally, a framework to make building a RAG app easier. We'll go into these pieces in detail in later videos. The first step in building a RAG app is data ingestion. First, we take the data and pass it to the embedding model. Next, the embedding model embeds the data and stores it into the vector database. Now, when we use the RAG app, we're essentially just dropping an LLM as an interface on top of our data. One new thing to note here is that I've wrapped the embedding model and vector database in a new box and titled it Retriever. In step one, the user sends a query to the LLM. In step two, the LLM sends a query to the retriever, usually some augmented form of the user query. In step three, the retriever embeds the query with the embedding model and looks for similar entries in the vector database. In step four, those top similar entries are sent back to the LLM. In step five, the LLM synthesizes these responses and sends them back to the user. For our example, we're going to be using an OpenAI large language model through GitHub Models, then a couple of different text embedding models from OpenAI as well, a simple JSON file as our vector index and LlamaIndex as our framework. This is what our example RAG app will look like. This is the same diagram we saw before, except you'll see that we've wrapped everything with a LlamaIndex framework. The main reason we use this framework is for simplicity's sake.

Contents