From the course: Introduction to Attention-Based Neural Networks
Unlock this course with a free trial
Join today to access over 24,700 courses taught by industry experts.
Attention models for image captioning
From the course: Introduction to Attention-Based Neural Networks
Attention models for image captioning
- Now, so far we've discussed how attention models work with encoders and decoders for language translation. But how will we use attention models in image captioning? Well, the principle is the same as that of language translation models, but there are some interesting twists. The main thing is if you're working on images, images are not really sequential input. Which means when you focus attention on parts of an image, you're not actually focusing attention at different time instances in an input sequence. You're actually focusing attention across a two dimensional representation, the image. Also, we generate embeddings or representations of images using convolution neural networks. So, we pass an image through a CNN, and we get a representation of the image at the output of the CNN. This image representation, which is the output of the convolution neural network, can be thought of as the hidden…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
The role of attention in sequence to sequence models4m 53s
-
(Locked)
Attention mechanism in sequence to sequence models6m 21s
-
(Locked)
Alignment weights in attention models2m 25s
-
(Locked)
Bahdanau attention3m 28s
-
(Locked)
Attention models for image captioning3m 49s
-
(Locked)
Encoder decoder structure for image captioning3m 45s
-
-
-
-