Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture, detailing the decoder's components and their functions, including masked self-attention and encoder-decoder attention.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky, James H. Martin, 2025 - A comprehensive academic treatment of natural language processing, including a detailed discussion of the Transformer architecture and its decoder stack.
Natural Language Processing with Transformers, Lewis Tunstall, Leandro von Werra, and Thomas Wolf, 2022 (O'Reilly Media) - Offers practical guidance on using Transformer models, with clear explanations of their architecture, including the decoder's role and sub-layers.
The Illustrated Transformer, Jay Alammar, 2018 - Uses diagrams and visuals to explain the Transformer architecture, making the decoder components and their interactions accessible and easy to grasp.