Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, 2014International Conference on Learning Representations (ICLR 2015) (arXiv preprint)DOI: 10.48550/arXiv.1409.0473 - Presents the original attention mechanism for sequence-to-sequence models in neural machine translation, resolving the limitation of fixed-size context vectors.
Effective Approaches to Attention-based Neural Machine Translation, Minh-Thang Luong, Hieu Pham, and Christopher D. Manning, 2015Proceedings of the 2015 Conference on Empirical Methods in Natural Language ProcessingDOI: 10.48550/arXiv.1508.04025 - Investigates and compares various attention mechanisms, such as global and local attention, along with different scoring functions, elaborating on prior research.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, which depends entirely on attention mechanisms (self-attention) and replaces recurrent layers, becoming a core component for modern deep learning in NLP.