From the course: TensorFlow: Working with NLP

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Tokenizers

Tokenizers

- [Tutor] Let's head over to the Colab notebook to confirm our understanding of tokenization and code. So, in the first couple of cells, we're installing TensorFlow texts and the TensorFlow Models Official. We then go ahead and import these Python packages. And then we're loading a BERT model from TensorFlow hub. We're using a BERT model with the uncased widths. And so you can see that we have a vocabulary size of about 30,000 tokens. So you can see that the BERT model is a standard BERT model with uncased widths, and it has 12 layers. So our input sentence is going to be, "I like NLP." This is then tokenized and then the tokens are then converted to IDs. So, "I like NLP" is converted to, "I like NL" and then "##P." These IDs correspond to token IDs. If I enter two sentences, "I like NLP." And, "what about you?" I then enter that into the BERT model and you can see that we get this output results. So let's look…

Contents