Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Offers comprehensive coverage of regularization techniques in deep learning, including the mathematical formulation and impact of L1 regularization on model weights and sparsity.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009 (Springer) - Provides an in-depth statistical explanation of L1 regularization (Lasso), detailing its properties, the mechanism behind sparsity, and its comparison to L2 regularization.
L2 Regularization, Dropout, L1 Regularization, Stanford University CS231n course staff, 2023 (Stanford University) - These lecture notes provide a clear and intuitive explanation of L1 regularization, its mathematical formulation, gradient impact, and how the non-differentiability at zero is handled in practical deep learning contexts.