Site Reliability Engineering: How Google Runs Production Systems, Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy, 2016 (O'Reilly Media) - This foundational text details Google's approach to maintaining highly reliable and scalable systems, including principles of capacity planning, service level objectives (SLOs), and performance measurement, directly relevant to the section's themes.