On the difficulty of training Recurrent Neural Networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28DOI: 10.1109/ICML.2013.88 - A foundational paper that identifies the exploding gradient problem in deep neural networks and proposes gradient clipping as a mitigation strategy.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook providing a theoretical and practical overview of deep learning, with a specific section discussing gradient clipping in the context of optimization.
torch.nn.utils.clip_grad_norm_, PyTorch Developers, 2024PyTorch Documentation (PyTorch Foundation) - Official documentation for the PyTorch utility function that implements gradient clipping by norm, detailing its usage, parameters, and return values.
CS231n: Convolutional Neural Networks for Visual Recognition - Optimization section, Fei-Fei Li, Justin Johnson, Serena Yeung, 2017Stanford University Course Material (Stanford University) - Educational material from a renowned university course covering practical aspects of training neural networks, including common optimization strategies like gradient clipping.