Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 2015International Conference on Learning Representations (ICLR) 2015DOI: 10.48550/arXiv.1412.6980 - Original paper introducing Adam and Adamax optimizers, detailing their algorithms and theoretical foundations.
Incorporating Nesterov Momentum into Adam, Timothy Dozat, 2016ICLR 2016 Workshop (Stanford University) - This report introduces Nadam (Nesterov-accelerated Adaptive Moment Estimation) by combining Nesterov momentum with the Adam optimizer.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering various optimization algorithms, including adaptive methods like Adam, providing theoretical background and practical insights.
torch.optim.Adamax and torch.optim.Nadam, PyTorch Authors, 2025 (PyTorch Foundation) - Official documentation for Adamax and Nadam optimizers within the PyTorch deep learning framework, including parameter details and usage examples.