Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This book is a comprehensive reference for deep learning, offering detailed explanations of loss functions, various optimization algorithms including gradient descent and its variants, and the theoretical underpinnings of neural network training.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2014International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - The original paper introducing the Adam optimizer, providing a detailed explanation of its algorithm, adaptive learning rate mechanism, and empirical evaluation.
PyTorch Official Documentation: Loss Functions and Optimizers, PyTorch Developers, 2024 - Official documentation for PyTorch's nn module, covering various loss functions, and optim module, detailing available optimization algorithms and their parameters.
Optimization: Stochastic Gradient Descent, Justin Johnson, Andrej Karpathy, and Fei-Fei Li, 2023Stanford University CS231n Lecture Notes - Lecture notes from Stanford's CS231n course, offering clear explanations of optimization techniques, including different types of gradient descent, learning rate schedules, momentum, and adaptive optimizers like Adam and RMSprop.