From the course: Natural Language Processing for Speech and Text: From Beginner to Advanced

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Text representation: Bag-of-words (BoW)

Text representation: Bag-of-words (BoW)

- [Instructor] In previous videos, we have learned about one-hot encoding and engrams for text representation. If you're already thinking, what about full documents? You are right. This is where bag-of-words, or BoW, comes in. Bag-of-words represent text data by considering the frequency of tokens in a document. So, in a corpus, which is a collection of documents, each document is represented as a vector of word count, with each dimension representing specific words from the vocabulary. With bag-of-words, the focus is on token counts. The order and grammatical structure is disregarded. Consider these three different sentences. Natural language processing for speech and text. Language processing for speech and text. Text and speech for natural language processing. They have exactly the same count, one instance of the words, an instance of natural, language, processing, for, speech, and, text. Even though they contain the same element in the same count, the order has changed their…

Contents