From the course: MLOps Tools: MLflow and Hugging Face
Unlock this course with a free trial
Join today to access over 24,700 courses taught by industry experts.
Working with datasets - Hugging Face Tutorial
From the course: MLOps Tools: MLflow and Hugging Face
Working with datasets
- [Instructor] Here we're in the Hugging Face interface, and I'm going to talk a little bit about datasets here. Now, datasets are one of the raw materials for working with the Hugging Face hub. And in fact, one of the best ways to look at a dataset is to go to the datasets interface here and filter through things like the most likes, for example, and in fact, here's a really useful one, Wikipedia. We can take a look at this and say that the Wikipedia dataset contains cleaned up articles of all languages, and it's from the dump. So, super useful. Someone could actually get inside of here and grab this entire Wikipedia dataset here and get it from a particular date and go through here and do whatever they needed to do. For example, make instances of the dataset, get a very large file. Now, if you wanted to grab a slightly smaller version, you can take a look at Glue here, which is a very popular dataset, and if we go to the dataset viewer here, this really shows you how it's designed…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.