Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A standard textbook that details activation functions, including their mathematical properties and practical uses in neural network design.
Deep Sparse Rectifier Networks, Xavier Glorot, Antoine Bordes, and Yoshua Bengio, 2011Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 15 (Machine Learning Research (MLR) Press) - Introduces the Rectified Linear Unit (ReLU) as an activation function, showing its ability to mitigate the vanishing gradient problem and accelerate training.