Interaction Between Regularization and Optimization
Was this section helpful?
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering fundamental concepts of deep learning, including regularization techniques (L1, L2, Dropout, Batch Normalization) and optimization algorithms, their theoretical basis, and practical considerations.
Decoupled Weight Decay Regularization, Ilya Loshchilov, Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - Introduces "AdamW", a decoupled weight decay formulation that separates the L2 regularization from the adaptive learning rate mechanism, showing improved performance and generalization for adaptive optimizers like Adam.
Dropout: A Simple Way to Prevent Overfitting Neural Networks, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, 2014Journal of Machine Learning Research (JMLR), Vol. 15 - The seminal paper introducing Dropout, detailing its mechanism as a regularization technique that prevents overfitting by randomly dropping units during training, and discussing its benefits.