adporn.net Embedding examples - Advanced RAG Applications with Vector Databases Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Advanced RAG Applications with Vector Databases

Embedding examples

From the course: Advanced RAG Applications with Vector Databases

Embedding examples

“

- [Presenter] Let's look at some examples of how you can embed data. There are many ways to embed and there are many things that you can embed. The three primary methods we'll cover for embedding in this section are the basic embeddings, small to big, and big to small, and we'll also briefly discuss non-English examples. The most basic method of embedding is to just straight up embed the chunk. Sometimes this works for your most basic tasks. However, when it comes to advanced RAG use cases and putting things into production, you're going to need something a little more involved. Small to big is a term coined by former LlamaIndex head of TypeScript and Partnerships, Yi Ding, and he coined it at one of my first events in San Francisco. The idea behind small to big is that you embed a sentence, but you store the whole paragraph as text. Why would you do this? Well, it's good for increased context. Some texts have very short sentences, and it's helpful to retrieve not just the sentence or the one sentence proceeding or following it, but the entire paragraph in which that sentence was used. This is another way to help ensure semantic coherence like we covered in chunking. Big to small is the opposite of small to big. Instead of embedding a sentence and storing a paragraph, we embed a paragraph and store a sentence. Well, why would we do this? Sometimes sentences themselves don't always make sense, and the tactics of chunking sentences may leave some sentences broken. For example, if we have a period following Mr., then we may have a broken sentence. Embedding a whole paragraph and retrieving all the sentences separately lets us do some post-processing before feeding the chunks to an LLM to ensure that we get the right context. Finally, we're looking at non-English embeddings. Here's a special case. If you're not working with English data, you'll need an embedding model that was trained on non-English data. You have a few options. One of the easiest, but perhaps the not the most efficient or cost-effective methods, is to use an LLM that has multiple language data. Examples include GPT models beyond 3.5, Mixtral, and Queen. If you're looking for a more compute-friendly option, you can search the MTEB leaderboards for models in different languages such as French, Polish, Chinese, and more.

Contents

- Next steps
  
  26s