Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - Introduces the Gated Recurrent Unit (GRU) architecture, detailing its gates including the update gate, and its application in sequence-to-sequence learning.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Chapter 10 provides a comprehensive theoretical background on recurrent neural networks, including a detailed explanation of GRU architecture and its gating mechanisms.
CS230: Deep Learning - Lecture Notes for Recurrent Neural Networks, Afshine Amidi, Shervine Amidi, 2018 (Stanford University) - Provides a clear, pedagogical explanation of RNNs, LSTMs, and GRUs, including the mathematical formulation and intuitive understanding of gating mechanisms.