From the course: Advanced LLMs with Retrieval Augmented Generation (RAG): Practical Projects for AI Applications

Course introductions

- Welcome to Mastering Large Language Models or LLM with Advanced RAG, retrieval augmented generation. My name is Guy Ernst, also known as The ML Guy, and I will be teaching this class. Before we start, you might remember from your childhood this movie, where Mickey Mouse is the wizard apprentice, try to summon the magic that will do his work, but then he discovered that he cannot stop it, and the brooms are stepping all over him. This quote of "Never summon a power you can't control" is from the book, "The Nexus" from Yuvak Harari, who try to explain what is AI and how AI can integrate here into our life. I like to depict it because I think that this is quite similar to what is happening in enterprises today where you try to demo your new shiny AI, but then you are crushed when you try to move it to production or scale it up. This paper, it was published about a year ago, and try to claim that RAG does not work for enterprises basically from the same reason I just mentioned, that people don't know how to use that and there are many, many concerns and risks. When in multiple servers to enterprise companies, they discovered that there are two main domains of issues when trying to use AI or LLMs or RAG specifically in enterprises. One is people, about knowledge and experience, which is something they don't have much because of those new technologies. And the other one is around the technology aspect and maturity or the immaturity of those tools and frameworks. In this course, we're going to focus on the people aspect and more specifically about the insufficient experience with RAG concept. We will cover many of the RAG concepts, especially the advancement, the ones that are needed to take them into production. Before we jumping into what is RAG, it's important to remember that RAG is not the only option of using LLMs in the organization based on how much of the knowledge that is needed for the application to be able to reply to queries of users is internal. That is, it's not available from the outside and when we have a lot of internal knowledge, we have to build the RAG. But sometimes our system can use mostly prompt engineering like knowledge extraction and classification and things like that because there is no much internal knowledge or data that is needed. Of course, there is more advanced topics where we need to fine tune the models, especially when the language that is used in our system, medical, legal or otherwise is more specific. It's not general and most people in the world don't write paper in it. Therefore, we need to do some fine tuning. In this course, we're not going to touch a lot about the other aspect. We're going to focus on RAG, but remember RAG is basically only 70 or so percent of the LLM systems. Okay, so let's see a quick diagram of a simple retrieval augmented generation system. It has the user asking a query. The first step is to retrieve the relevant information for the query from some kind of a document database, usually using a vector index to have a semantic search. Then we take the retrieve documents, we rank them, we choose which of them we want to include in our prompt to the generation step using a large language model. And then the reply will come based on the real data, less lumenations, less mistakes, more specific. There is another aspect. We really don't talk a lot about it. How do you index the data? And in this course we spend most of the time on improving the retrieval by improving the indexing process. This diagram just added the jigsaw puzzle aspect, and I want you to think about building a system in a modular way. It's not so important to choose the model now, the LLM model or the vector database or the embedding model and so on. In this course we're going to learn how to evaluate them and how to tweak them, but we know the technology is moving very, very quickly. The best LLM model today might not be the best tomorrow in accuracy, in cost, in scale, in speed, in context, and so on. And the same for other parts like the Vector database and so on. So when you build a system, it should be modular. And as I said, in this course, we're going to focus on the concept which are common to any of the tools you're going to select and each one of them will offer one way or another of this concept in its configurations. In this course, this is only introductions, so we're not going to dive into all the concepts here, but we're going to talk about a few advanced topics, embedding and chunking and enrichment and hybrid search and so on. At the end of the course, you will be much more familiar with this diagram. You'll be able to take it and build something powerful and as advanced as this one for your organization. We have a repository that you can use for this course. You have the address here and it includes a few of the diagrams that we just saw and a few Jupyter Notebooks that will go over each one of the topics. We'll start with the first one on the simple RAG, but each one of the topics, we're going to evaluate that, see how it's working, understand bit under the hood, what is going on there, so you will be able to use it in a much better way. And please join me when we talk about the first simple RAG application.

Contents