Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - The foundational paper introducing the Transformer architecture, its encoder-decoder design, self-attention, and positional encoding.
Transformer API Reference, PyTorch Documentation, 2024 (PyTorch Foundation) - Official PyTorch documentation for the built-in Transformer module, providing implementation details and API usage.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2023 - A comprehensive textbook chapter covering the Transformer architecture in detail, including its components and applications in NLP.