From the course: Natural Language Processing for Speech and Text: From Beginner to Advanced
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
Mel-frequency cepstral coefficients (MFCCs) using librosa - Python Tutorial
From the course: Natural Language Processing for Speech and Text: From Beginner to Advanced
Mel-frequency cepstral coefficients (MFCCs) using librosa
- [Instructor] In the previous video, we discussed MFCCs. Now we will implement it on the sample audio file using the Python audio data analysis library called Librosa. We'll be writing the code in CoLab notebook. To access that, go to colab.research.google.com. If you already have a Google account, it will open automatically. If you don't have one, it will prompt you to log in. Remember that you can work on this code in any ID of your choice. The sample file that we will be working with will be provided with the exercise files. Let's start by importing the necessary libraries. Import Librosa, import Librosa.display, import matplotlib, import numpy as NP. Now we are going to upload the provided sample file. That will take a couple of seconds to load into your workspace. Now that we have the audio file, let's copy the file part and we are going to load it. And then you provide the file part that you just copied. Okay, now we have a why a y, a numpy array representing the data. And we…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
Speech representation: Mel-frequency cepstral coefficients2m 10s
-
(Locked)
Mel-frequency cepstral coefficients (MFCCs) using librosa3m 28s
-
(Locked)
Speech representation: Linear predictive cepstral coefficients (LPCCs)1m 51s
-
(Locked)
Linear predictive coding (LPC) using librosa3m 58s
-
(Locked)
Speech representation: Gammatone filterbank features1m 21s
-
(Locked)
Gammatone filterbank features using librosa3m 16s
-
(Locked)
Speech representation: Spectrograms2m 25s
-
(Locked)
Spectrograms using fast Fourier transform (FFT) in librosa3m 24s
-
(Locked)
Speech representation: Speech embeddings1m 53s
-
(Locked)
Speech embeddings using wav2vec in Transformers5m 13s
-
(Locked)
-
-