Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook providing a foundational understanding of deep learning, with a dedicated chapter that thoroughly explains the need for sequence models and the principles of recurrent neural networks.
Finding structure in time, Jeffrey L. Elman, 1990Cognitive Science, Vol. 14 (Wiley)DOI: 10.1207/S15516709COG1402_1 - A foundational paper introducing recurrent neural networks (often called Elman networks), demonstrating their ability to learn and represent temporal dependencies in sequential data through an internal state mechanism.
CS224n: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2025 (Stanford University) - Online course materials from a leading university, offering lectures and readings that clearly explain the limitations of feedforward networks for natural language processing and introduce the architecture and motivation behind sequence models like RNNs.
Learning long-term dependencies with gradient descent is difficult, Yoshua Bengio, Patrice Simard, Paul Rosen, 1993Advances in Neural Information Processing Systems, Vol. 6DOI: 10.5591/978-1-57766-004-8.735 - This paper identifies the vanishing and exploding gradient problems in recurrent neural networks, highlighting a challenge in learning long-term dependencies and influencing the development of more robust sequence models.