Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - Introduces the Gated Recurrent Unit (GRU) architecture as a simpler alternative to LSTMs for sequence modeling.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Provides a detailed theoretical foundation of recurrent neural networks, including LSTMs and GRUs, in the context of sequence modeling.