Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, 2014Journal of Machine Learning Research, Vol. 15 (JMLR) - Introduces the Dropout technique for regularization in neural networks.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - Presents the Transformer architecture and describes the application of Dropout within it.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Provides a comprehensive explanation of regularization techniques, including Dropout, Layer Normalization, and weight decay.