Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - This textbook provides a detailed explanation of optimization challenges in deep learning, including local minima, saddle points, slow convergence in ravines, and the impact of learning rate.
The Loss Surfaces of Multilayer Networks, Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun, 2015Proceedings of Machine Learning Research, Vol. 38 (PMLR) - This paper provides theoretical insights into the geometry of loss surfaces for deep neural networks, arguing that in high dimensions, most local minima are empirically good and the primary challenge comes from saddle points.
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio, 2014Advances in Neural Information Processing Systems (NIPS 27) (MIT Press) - This paper discusses the prevalence of saddle points over local minima in high-dimensional non-convex optimization, explaining how they can impede gradient-based optimization algorithms.
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal, 2018SIAM Review, Vol. 60 (Society for Industrial and Applied Mathematics)DOI: 10.1137/16M1080173 - This review article offers a comprehensive survey of optimization algorithms used in large-scale machine learning, discussing the theoretical foundations and practical aspects relevant to deep learning.