Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a comprehensive theoretical and practical introduction to recurrent neural networks, sequence modeling, and generative models, covering the foundational aspects of iterative sequence generation and common architectures like LSTMs and GRUs.
The Curious Case of Neural Text Degeneration, Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1904.09751 - Introduces and analyzes various sampling strategies, including top-p (nucleus) sampling, for improving the quality and diversity of text generated by neural language models, addressing common issues like repetition in greedy decoding.
Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (The MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Presents the seminal architecture of Long Short-Term Memory (LSTM) networks, which are core to the success of recurrent neural networks in modeling and generating sequences with long-range dependencies, as highlighted in the section.
CS224N: Natural Language Processing with Deep Learning (Course Materials), Diyi Yang, Tatsunori Hashimoto, 2025 (Stanford University) - Offers comprehensive lecture notes and assignments that cover recurrent neural networks, including LSTMs and GRUs, and practical aspects of sequence generation such as sampling strategies and character-level versus word-level models in the context of natural language processing.