Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and the self-attention mechanism, forming the basis for many modern large language models.
The Illustrated Transformer, Jay Alammar, 2018 - A highly visual and intuitive explanation of the Transformer model, particularly helpful for understanding the self-attention mechanism.