On the importance of initialization and momentum in deep learning, Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton, 2013Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 - A seminal paper that highlights the significant benefits of using momentum, along with proper initialization, for effectively training deep neural networks.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A standard textbook in deep learning, offering a detailed explanation of optimization algorithms, including the mechanics and advantages of momentum.
torch.optim.SGD, PyTorch Developers, 2024 - The official PyTorch documentation for the Stochastic Gradient Descent optimizer, clearly outlining the momentum parameter and its direct use in implementation.