Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam algorithm, its mathematical formulation, and experimental results, serving as the foundational paper.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Provides a comprehensive explanation of optimization algorithms in deep learning, including a detailed discussion of Adam.
Optimization: Stochastic Gradient Descent, Stanford University, 2024 - Offers an accessible overview of various optimization techniques used in neural networks, with a clear explanation of Adam.