The Importance of Proper Initialization

Was this section helpful?

References

Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot and Yoshua Bengio, 2010 Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 9 (Proceedings of Machine Learning Research) DOI: 10.5555/3172186.3172237 - Introduces the Glorot (Xavier) initialization method to mitigate vanishing/exploding gradients in deep networks, analyzing signal propagation.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, 2015 Proceedings of the IEEE International Conference on Computer Vision (ICCV) (IEEE) DOI: 10.1109/ICCV.2015.122 - Proposes the He (Kaiming) initialization specifically designed for ReLU activation functions, building upon principles for stable gradient flow.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides comprehensive coverage of deep learning fundamentals, including detailed explanations of vanishing/exploding gradients and initialization strategies.