Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.55919/nips.2017.00078 - The foundational paper introducing the Transformer architecture and the self-attention mechanism, defining the Query, Key, and Value concepts.
Natural Language Processing with Transformers, Lewis Tunstall, Leandro von Werra, Thomas Wolf, 2022 (O'Reilly Media) - A practical guide explaining the QKV mechanism as part of Transformer models and their applications in natural language processing.