Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook on deep learning that thoroughly covers feedforward networks, their practical limitations, and the architectural solutions provided by convolutional and recurrent networks.
Gradient-Based Learning Applied to Document Recognition, Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, 1998Proceedings of the IEEE, Vol. 86 (IEEE)DOI: 10.1109/5.726791 - A seminal paper introducing convolutional neural networks (LeNet-5), detailing their architectural benefits for image recognition by exploiting local correlations and translation invariance, thereby overcoming limitations of fully connected layers.
Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - This foundational paper introduces Long Short-Term Memory (LSTM) networks, an architecture highly effective for enabling recurrent networks to learn and remember long-term dependencies in sequential data, addressing a major weakness of simpler feedforward or recurrent models.
Convolutional Neural Networks for Visual Recognition (CS231n), Stanford University, 2024 (Stanford University) - Official course notes offering a clear explanation of convolutional neural networks, including an analysis of why feedforward networks are insufficient for image data and how CNNs address these challenges through architectural design. (Spring 2023 version)