Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NIPS) 30DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture and the self-attention mechanism, which is the core subject of the section.
torch.nn.MultiheadAttention, PyTorch Documentation, 2024 (PyTorch Foundation) - Official documentation for the PyTorch layer used in the code example for extracting attention weights.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky, James H. Martin, 2025 (Stanford University) - A widely recognized textbook that offers a detailed account of Transformers, attention mechanisms, and their analysis in NLP, covering theory and practical aspects.