Challenges with Simple RNNs (Vanishing/Exploding Gradients)
Was this section helpful?
Learning Long-Term Dependencies with Gradient Descent Is Difficult, Yoshua Bengio, Patrice Simard, and Paolo Frasconi, 1994IEEE Transactions on Neural Networks, Vol. 5 (IEEE)DOI: 10.1109/72.279181 - This seminal paper rigorously analyzes the fundamental difficulty of learning long-term dependencies in recurrent neural networks due to vanishing and exploding gradients.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Chapter 10, "Recurrent and Recursive Networks," offers a comprehensive overview of BPTT, vanishing/exploding gradients, and mitigation strategies like gradient clipping.
Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - This foundational paper introduces the Long Short-Term Memory (LSTM) architecture, which effectively addresses the vanishing gradient problem in recurrent neural networks.
Recurrent Neural Networks (RNNs), Andrej Karpathy and others (Stanford CS231n), 2019 - These course notes provide a clear and practical explanation of RNNs, including BPTT, the vanishing/exploding gradient problems, and gradient clipping.