Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS) 30 (Curran Associates, Inc.)DOI: 10.5555/3295222.3295349 - The foundational paper that introduced the Transformer architecture and the encoder-decoder attention mechanism, providing the original formulation and detailed explanation.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2023 (Draft) - A comprehensive textbook explanation of Transformer networks, including a detailed pedagogical exposition of the encoder-decoder attention mechanism, suitable for in-depth study.
Transformers and Self-Attention (Lecture 11 notes from CS224N), John Hewitt, Christopher Manning, 2023 (Stanford University) - Provides clear and accessible lecture notes from a leading university course, offering a pedagogical explanation of the Transformer architecture and its attention mechanisms.