Review of Gradient Descent Variants (SGD, Momentum)

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive textbook covering theoretical and practical aspects of deep learning, including detailed explanations of gradient descent, SGD, and momentum in Chapter 8.
On the importance of initialization and momentum in deep learning, Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, 2013 Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 (PMLR) DOI: 10.55982/annals.v28i1.9213 - Foundational paper demonstrating the effectiveness of momentum, particularly Nesterov's Accelerated Gradient, in training deep neural networks.
torch.optim.SGD, PyTorch Core Team, 2024 - Official documentation for the Stochastic Gradient Descent optimizer in PyTorch, detailing its parameters including momentum and usage.
Lecture Notes: Optimization Algorithms, Stanford University, 2023 - High-quality educational resource providing an accessible explanation of gradient descent, SGD, and momentum within the context of deep learning.