Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot and Yoshua Bengio, 2010Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 9 (Proceedings of Machine Learning Research)DOI: 10.5555/3172186.3172237 - Introduces the Glorot (Xavier) initialization method to mitigate vanishing/exploding gradients in deep networks, analyzing signal propagation.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides comprehensive coverage of deep learning fundamentals, including detailed explanations of vanishing/exploding gradients and initialization strategies.