Computer Architecture: A Quantitative Approach, John L. Hennessy and David A. Patterson, 2017 (Morgan Kaufmann) - Standard textbook on computer architecture principles, including memory hierarchy, cache coherence, and performance metrics, which underpin the concepts of memory and compute bottlenecks.
Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, 2020arXiv preprint arXiv:2001.08361DOI: 10.48550/arXiv.2001.08361 - The seminal paper defining the relationship between model size, dataset size, compute, and performance for large language models, providing context for their immense resource demands.