On the Importance of Initialization and Momentum in Deep Learning, Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, 2013Proceedings of the 30th International Conference on Machine Learning (ICML) - Discusses the effectiveness of momentum-based SGD, including Nesterov momentum, for deep learning.