Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, 2020arXivDOI: 10.48550/arXiv.2001.08361 - This paper examines how model performance improves with increased parameters, dataset size, and compute, providing a framework for understanding model scale.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering fundamental concepts of deep learning, including neural network parameters and model capacity.
CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2025 (Stanford University) - An advanced course that covers modern NLP models, including architectures and the importance of model size.