Gradient Clipping and Explosion/Vanishing Gradients
Was this section helpful?
Learning Long-Term Dependencies with Recurrent Neural Networks, Yoshua Bengio, Patrice Simard, Paolo Frasconi, 1994Advances in Neural Information Processing Systems 6 (NIPS 1993) - Essential paper identifying the vanishing gradient problem in recurrent neural networks.
On the difficulty of training Recurrent Neural Networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 - Introduces gradient clipping as a technique to stabilize training of recurrent neural networks by addressing exploding gradients.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Comprehensive textbook offering foundational understanding of deep learning, including detailed explanations of gradient issues and their solutions.
CS224n: Natural Language Processing with Deep Learning Lecture Notes, Diyi Yang, Tatsunori Hashimoto, 2024 (Stanford University) - University course materials offering practical and theoretical insights into training deep learning models, especially recurrent neural networks, covering gradient stability.