From the course: Hands-On AI: RAG using LlamaIndex

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Prompt compression

Prompt compression

- [Instructor] Suppose you're interacting with a RAG system or a AI system, or a Large Language Model in general, and you're asking it complex questions that's requiring it to draw upon a large amount of background information. Typically, this would require sending a very long prompt to the language model. This of course, can be slow, it could be expensive, and you might even exceed the model's context window. This is where prompt compression comes in. And what we're going to talk about in this lesson is a technique called LongLLMLingua. This uses a prompt compression method to drastically shorten the prompt while retaining the most relevant information that's needed to answer the question. That way, we'll have faster and more cost effective generation while still getting high quality answers. The key components of this technique of LongLLMLingua is question-aware, coarse-grained prompt compression, which means we're evaluating the relevance between the context and the question based…

Contents