torch.optim - PyTorch 2.3 documentation, PyTorch Developers, 2025 (PyTorch Foundation) - Official PyTorch documentation for the torch.optim package, detailing available optimizers and their usage.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering the theoretical foundations of deep learning, including optimization algorithms.
Decoupled Weight Decay Regularization, Ilya Loshchilov, Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - Proposes a method for decoupling weight decay from the gradient update, leading to algorithms like AdamW.