Gradient-based learning applied to document recognition, Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, 1998Proceedings of the IEEE, Vol. 86 (IEEE)DOI: 10.1109/5.726791 - This paper introduced the LeNet-5 architecture, establishing convolutional layers, pooling, and parameter sharing as core components of CNNs.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook providing detailed explanations of convolutional networks, their fundamental layers, and activation functions.
Rectified Linear Units for Restricted Boltzmann Machines, Vinod Nair and Geoffrey Hinton, 2010Proceedings of the 27th International Conference on Machine Learning (ICML) - This paper introduced the Rectified Linear Unit (ReLU) activation function, which became a standard component in deep neural networks.