Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1706.03762 - The foundational paper that introduced the Transformer architecture and the self-attention mechanism, providing the original description of Query, Key, and Value interactions and their role in attention score calculation.
The Annotated Transformer, Alexander Rush, 2018 - A widely recognized interactive guide that explains the Transformer architecture by implementing it in PyTorch, offering clear explanations of the Query, Key, and Value attention mechanism and its score calculation.
Transformer Models: An Introduction, Llion Jones, Ashish Vaswani, Noam Shazeer, Jakob Uszkoreit, and Illia Polosukhin, 2023 (O'Reilly Media) - An authoritative book written by some of the original Transformer authors, offering a comprehensive and updated introduction to Transformer models, including the attention mechanism and its scoring process.