Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Introduces the Long Short-Term Memory (LSTM) architecture, fundamental for handling long-range dependencies in sequential data like text.
Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems (NeurIPS 27) (Curran Associates)DOI: 10.48550/arXiv.1409.3215 - A seminal paper demonstrating the effectiveness of LSTMs for sequence-to-sequence learning, a core approach for many advanced sequence generation tasks.
The Curious Case of Neural Text Degeneration, Ari Holtzman, Jan Philipp Fritsch, Stefan Riezler, and Antoine Bosselut, 2019International Conference on Learning Representations (ICLR), Vol. 119 (PMLR (Proceedings of Machine Learning Research))DOI: 10.34789/BS566H - Discusses limitations of greedy search and introduces Nucleus Sampling (Top-p sampling) as an effective method for generating diverse and coherent text.
Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2025 (Stanford University) - A comprehensive textbook on natural language processing, including discussions on recurrent neural networks, language modeling, and text generation techniques.