Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and the original sinusoidal positional encoding method.
Speech and Language Processing (3rd edition draft), Daniel Jurafsky, James H. Martin, 2025 (Stanford University) - A comprehensive textbook explanation of Transformer models, including a detailed discussion of positional encoding in Chapter 10.
Lecture 9: Transformers and Large Language Models (Winter 2023), John Hewitt, Anna Goldie, 2023 (Stanford University (CS224N)) - Lecture notes from a leading university course, offering clear explanations and visualizations of Transformers and positional encoding.