adporn.net Prompt compression - LlamaIndex Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Hands-On AI: RAG using LlamaIndex

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Prompt compression

Prompt compression - LlamaIndex Tutorial

From the course: Hands-On AI: RAG using LlamaIndex

Start my 1-month free trial Buy for my team

Prompt compression

“

- [Instructor] Suppose you're interacting with a RAG system or a AI system, or a Large Language Model in general, and you're asking it complex questions that's requiring it to draw upon a large amount of background information. Typically, this would require sending a very long prompt to the language model. This of course, can be slow, it could be expensive, and you might even exceed the model's context window. This is where prompt compression comes in. And what we're going to talk about in this lesson is a technique called LongLLMLingua. This uses a prompt compression method to drastically shorten the prompt while retaining the most relevant information that's needed to answer the question. That way, we'll have faster and more cost effective generation while still getting high quality answers. The key components of this technique of LongLLMLingua is question-aware, coarse-grained prompt compression, which means we're evaluating the relevance between the context and the question based…

Contents

- (Locked)
  
  LlamaIndex evaluation
  
  1m 56s
- (Locked)
  
  Comparative analysis of retrieval-augmented generation techniques
  
  8m 8s