On the difficulty of training Recurrent Neural Networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning (ICML 2013)DOI: 10.1109/ICML.2013.88 - Introduces the exploding and vanishing gradient problems in RNNs and proposes gradient clipping as a method to mitigate exploding gradients.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Provides a comprehensive explanation of recurrent neural networks, their training challenges, and optimization techniques including gradient clipping. Chapters 8 and 10 are particularly relevant.
Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems 27 (NeurIPS)DOI: 10.48550/arXiv.1409.3215 - A landmark paper demonstrating the practical effectiveness of gradient clipping in the context of sequence-to-sequence models with recurrent neural networks.