From the course: Natural Language Processing for Speech and Text: From Beginner to Advanced
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
N-grams representation using NLTK - Python Tutorial
From the course: Natural Language Processing for Speech and Text: From Beginner to Advanced
N-grams representation using NLTK
- [Instructor] In the previous video, we discussed n-grams. Let's practice how to create them in Python using the NLTK library. First, go to colab.research.google.com or any Python ID you prefer. The first thing we are going to do is to import NLTK and from NLTK.util import the n-grams. Let's still use our formal sentence, natural language processing for speech and text data. The first thing we need to do to apply n-grams is to tokenize our sentences. So we have an error message. And I would like you to pay attention to reading error messages. For example, this error message is saying that it's trying to look up something called Punkt and is not found here, and this is the way to download it. So let's follow this instruction by downloading Punkt. When we run the code now, you can see that it works. So we have input the n-grams from NLTK.util. Let's apply it to our words. Because we like our output as a list, let's put it in the list function. First we will like unigrams, so you can…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
(Locked)
Text preprocessing3m 6s
-
Text preprocessing using NLTK7m 10s
-
(Locked)
Text representation2m 18s
-
(Locked)
Text representation: One-hot encoding2m 6s
-
(Locked)
One-hot encoding using scikit-learn3m 32s
-
(Locked)
Text representation: N-grams2m 21s
-
(Locked)
N-grams representation using NLTK3m 3s
-
(Locked)
Text representation: Bag-of-words (BoW)2m 1s
-
(Locked)
Bag-of-words representation using scikit-learn2m 29s
-
(Locked)
Text representation: Term frequency-inverse document frequency (TF-IDF)1m 50s
-
(Locked)
TF-IDF representation using scikit-learn2m 8s
-
(Locked)
Text representation: Word embeddings2m 56s
-
(Locked)
Word2vec embedding using Gensim9m 8s
-
(Locked)
Embedding with pretrained spaCy model5m 7s
-
(Locked)
Sentence embedding using the Sentence Transformers library3m 42s
-
(Locked)
Text representation: Pre-trained language models (PLMs)2m 34s
-
(Locked)
Pre-trained language models using Transformers5m 43s
-
(Locked)
-
-
-