From the course: Introduction to Attention-Based Neural Networks

Unlock this course with a free trial

Join today to access over 24,700 courses taught by industry experts.

Attention models for image captioning

Attention models for image captioning

- Now, so far we've discussed how attention models work with encoders and decoders for language translation. But how will we use attention models in image captioning? Well, the principle is the same as that of language translation models, but there are some interesting twists. The main thing is if you're working on images, images are not really sequential input. Which means when you focus attention on parts of an image, you're not actually focusing attention at different time instances in an input sequence. You're actually focusing attention across a two dimensional representation, the image. Also, we generate embeddings or representations of images using convolution neural networks. So, we pass an image through a CNN, and we get a representation of the image at the output of the CNN. This image representation, which is the output of the convolution neural network, can be thought of as the hidden…

Contents