Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.5591/978-0-9883408-0-3_2 - Introduced the Transformer architecture and the Scaled Dot-Product Attention mechanism.
The Annotated Transformer, Alexander Rush, 2018 - An in-depth, code-level explanation and implementation of the 'Attention Is All You Need' paper, offering practical insight into the attention mechanism.