From the course: Generative AI: Introduction to Large Language Models
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
The attention mechanism
From the course: Generative AI: Introduction to Large Language Models
The attention mechanism
- [Instructor] Introduced in a 2017 paper title, "Attention is All You Need," transformers are an autoregressive encoder-decoder neural network architecture that makes use of a mechanism known as self-attention. As we learned in the previous course video, the encoding component of a transformer is made up of a stack of identical encoders. Each encoder has two main sublayers, the self-attention layer and the feed-forward layer. As input is fed to an encoder, it first passes through the self-attention layer, then to the feed-forward layer, which further processes the data. The feed-forward layer is a feed-forward neural network, which we previously learned about in the deep learning course video. The self-attention layer captures the importance of different words in relation to each other using a mechanism known as self-attention. This mechanism enables words to interact with each other so they can figure out which other…