Deep Sparse Rectifier Networks, Xavier Glorot, Antoine Bordes, and Yoshua Bengio, 2011Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 15DOI: 10.5555/3104322.3104364 - This paper introduces the Rectified Linear Unit (ReLU) activation function and shows its advantages for deep neural networks, particularly in handling vanishing gradients.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A fundamental textbook that provides extensive information on neural networks, with detailed explanations of various activation functions like ReLU, their attributes, and their function in adding non-linearity and addressing gradient problems.