Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot, Yoshua Bengio, 2010Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 9 (JMLR.org)DOI: 10.5555/2078696.2078720 - The foundational paper introducing Xavier initialization, explaining the problem of vanishing/exploding gradients and the variance preservation principle.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook providing a broad explanation of deep learning concepts, including a detailed section on weight initialization techniques like Xavier.
torch.nn.init, PyTorch Development Team, 2022 (PyTorch) - Official documentation for PyTorch's initialization module, providing practical implementation details for Xavier (Glorot) initialization in PyTorch.