Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (NeurIPS)DOI: 10.5555/3295222.3295252 - Introduces the Transformer architecture, which forms the basis for the discussion on component initialization.
Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot, Yoshua Bengio, 2010Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Vol. 9 (PMLR) - Presents Xavier initialization, a method for stabilizing training of deep networks by preserving signal variance.
torch.nn.init, PyTorch Contributors, 2022 (PyTorch Foundation) - Official documentation for PyTorch's initialization functions, including normal_, kaiming_uniform_, zeros_, and ones_.