Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems 27 (NIPS 2014) - Introduces the foundational encoder-decoder architecture for sequence-to-sequence learning using LSTMs, illustrating the early approach to these challenges.
Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2014International Conference on Learning Representations (ICLR 2015, poster) - Presents the attention mechanism, an important innovation that addresses the fixed-context bottleneck in sequence-to-sequence models by allowing the decoder to selectively focus on parts of the input.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017) - Introduces the Transformer model, which exclusively relies on attention mechanisms to achieve state-of-the-art results in sequence-to-sequence tasks, effectively overcoming the limitations discussed.