Review of Gradient Descent Variants (SGD, Momentum)
Was this section helpful?
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive textbook covering theoretical and practical aspects of deep learning, including detailed explanations of gradient descent, SGD, and momentum in Chapter 8.
On the importance of initialization and momentum in deep learning, Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, 2013Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 (PMLR)DOI: 10.55982/annals.v28i1.9213 - Foundational paper demonstrating the effectiveness of momentum, particularly Nesterov's Accelerated Gradient, in training deep neural networks.
torch.optim.SGD, PyTorch Core Team, 2024 - Official documentation for the Stochastic Gradient Descent optimizer in PyTorch, detailing its parameters including momentum and usage.
Lecture Notes: Optimization Algorithms, Stanford University, 2023 - High-quality educational resource providing an accessible explanation of gradient descent, SGD, and momentum within the context of deep learning.