Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a comprehensive introduction to optimization algorithms, including gradient descent, learning rates, and advanced optimizers, foundational to neural network training.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 20153rd International Conference for Learning RepresentationsDOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely used adaptive learning rate optimization algorithm mentioned in the section as a good default choice for training neural networks.
CS231n: Deep Learning for Computer Vision Course Notes, Fei-Fei Li, Yunzhu Li, Ruohan Gao, 2023 (Stanford University) - Offers detailed explanations of optimization algorithms, including gradient descent, learning rates, and various optimizers, within the context of training neural networks, with practical examples.