Lecture 6e: RMSProp. Neural Networks for Machine Learning., Geoffrey Hinton, 2012University of Toronto (via Coursera) - Introduces the RMSprop algorithm as a method to adapt learning rates per-parameter based on a moving average of squared gradients.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a thorough theoretical explanation of optimization algorithms, including the formulation and intuition of RMSprop.
torch.optim.RMSprop, PyTorch Contributors, 2024 - Official documentation for the RMSprop optimizer in PyTorch, detailing its parameters and typical use.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Presents Adam, an optimizer that builds on RMSprop's adaptive learning rates by incorporating momentum, making it a valuable comparison.