Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering the mathematical and conceptual aspects of deep learning, including various regularization techniques like L2, dropout, and early stopping.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, 2014Journal of Machine Learning Research, Vol. 15 (Journal of Machine Learning Research) - The seminal paper introducing dropout as a regularization technique for neural networks, explaining its mechanism and benefits.
Decoupled Weight Decay Regularization, Ilya Loshchilov, Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - This paper proposes AdamW, which properly separates weight decay from the adaptive learning rate mechanism in optimizers like Adam, leading to improved regularization.
Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, 2016IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1512.00567 - Introduces label smoothing as a regularization component within the Inception-v2 and Inception-v3 architectures, demonstrating its benefits for model generalization.