From the course: TensorFlow: Working with NLP
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
Transfer learning
From the course: TensorFlow: Working with NLP
Transfer learning
- [Instructor] Transfer learning is made up of two components, pre-training and fine-tuning. So what does pre-training involve? Well, we're training a model from scratch. This means the model's weights are randomly initialized. The model is of no use at this point. The model is then trained on large amounts of data and then becomes useful. Now, let's compare the pre-training for some of the larger models. So BERT was released in 2018. The number of parameters was 109 million. It took Google 12 days to train BERT, and I've put an asterisk by the 8 times V100s because BERT wasn't trained on GPUs, but rather, Google's equivalent, TPUs or tensor processing units. So the size of the dataset used for training was 16 gigabytes, and the training tokens were 250 billion. And the data sources that was used to train BERT were Wikipedia and the BookCorpus. RoBERTa was developed by Facebook in 2019. The number of parameters was 125…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.