Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot, Yoshua Bengio, 2010Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Vol. 9 (PMLR) - Introduces Xavier (also known as Glorot) initialization, a method for initializing neural network weights to maintain signal variance across layers, particularly suitable for sigmoid and tanh activation functions.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A foundational textbook covering a broad range of deep learning topics, including a detailed discussion on weight initialization strategies and their theoretical underpinnings.