Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2025 - A comprehensive, continuously updated textbook providing foundational knowledge of natural language processing, including detailed explanations of language models.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30, Vol. 30 (Curran Associates, Inc.) - Introduces the Transformer architecture, a core innovation enabling the scale and capabilities of modern Large Language Models through its attention mechanism.
Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020arXivDOI: 10.48550/arXiv.2005.14165 - Presents GPT-3, a landmark Large Language Model, demonstrating how extreme scale in parameters and training data leads to impressive few-shot learning and diverse language generation capabilities.