Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Foundational textbook on deep learning with a detailed explanation of L2 regularization.
A simple weight decay can improve generalization, Anders Krogh, John A. Hertz, 1992Advances in Neural Information Processing Systems, Vol. 4 (Morgan Kaufmann) - Seminal paper that introduced the concept of weight decay and its effectiveness in improving generalization.