From the course: Generative AI: Introduction to Large Language Models

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

The attention mechanism

The attention mechanism

- [Instructor] Introduced in a 2017 paper title, "Attention is All You Need," transformers are an autoregressive encoder-decoder neural network architecture that makes use of a mechanism known as self-attention. As we learned in the previous course video, the encoding component of a transformer is made up of a stack of identical encoders. Each encoder has two main sublayers, the self-attention layer and the feed-forward layer. As input is fed to an encoder, it first passes through the self-attention layer, then to the feed-forward layer, which further processes the data. The feed-forward layer is a feed-forward neural network, which we previously learned about in the deep learning course video. The self-attention layer captures the importance of different words in relation to each other using a mechanism known as self-attention. This mechanism enables words to interact with each other so they can figure out which other…

Contents