Context Vectors from Attention Weights

Was this section helpful?

References

Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015 International Conference on Learning Representations (ICLR) DOI: 10.48550/arXiv.1409.0473 - This pioneering paper introduces the attention mechanism in sequence-to-sequence models, detailing how a context vector is created as a weighted sum of encoder hidden states to focus on relevant input parts.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems 30 (NeurIPS 2017) DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture, which entirely relies on attention mechanisms and explicitly defines the computation of context vectors via weighted sums of Value vectors.
CS224N: Natural Language Processing with Deep Learning - Lecture 8: Attention and Transformers, Christopher Manning, John Hewitt, 2023 (Stanford University) - Provides clear and accessible lecture slides explaining attention, self-attention, and the formulation of context vectors within the Transformer architecture for an introductory audience.
Speech and Language Processing (3rd ed. draft) - Chapter 9: Attention and Transformers, Daniel Jurafsky and James H. Martin, 2024 - A comprehensive textbook chapter offering a detailed explanation of attention mechanisms and Transformer models, including the mathematical definition and purpose of the context vector.