Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This foundational textbook provides a comprehensive treatment of stochastic gradient descent, its theoretical basis, and practical considerations in deep learning.
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, and Jorge Nocedal, 2018SIAM Review, Vol. 60 (Society for Industrial and Applied Mathematics)DOI: 10.1137/16M1080173 - A survey article providing a rigorous academic overview of stochastic gradient descent and its variants, discussing their theoretical properties and performance in large-scale machine learning.
An overview of gradient descent optimization algorithms, Sebastian Ruder, 2016arXiv:1609.04747 [cs.LG]DOI: 10.48550/arXiv.1609.04747 - This highly cited paper provides an accessible and practical overview of various gradient descent optimization algorithms, including SGD and mini-batch GD, making it excellent for understanding practical implementation.