Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems (NeurIPS 27)DOI: 10.48550/arXiv.1409.3215 - This seminal paper introduced the sequence-to-sequence (Seq2Seq) architecture using deep LSTMs for tasks like machine translation, demonstrating its effectiveness for mapping input sequences to output sequences of different lengths.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - This paper concurrently proposed an RNN Encoder-Decoder framework and introduced Gated Recurrent Units (GRUs), showing how to learn continuous representations for phrases in machine translation.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook that covers the theoretical foundations and practical applications of deep learning, including detailed explanations of recurrent neural networks, LSTMs, GRUs, and the encoder-decoder architecture. Specifically, Chapter 10, 'Sequence Modeling: Recurrent and Recursive Networks,' provides an in-depth discussion of these architectures.