Machine Learning Engineering, Andriy Burkov, 2020 (True Positive Inc.) - A practical guide to MLOps, covering essential aspects of monitoring, evaluation, and deployment strategies for machine learning systems, applicable to LLM-based agents.
Distributed Systems Observability: A Guide to Production Readiness, Cindy Sridharan, 2018 (O'Reilly Media) - Provides foundational principles and practices for observability in complex software systems, including structured logging, metrics, tracing, and alerting, applicable to monitoring agentic workflows.
MLOps Challenges and Solutions for Large Language Models, Alex Vasile, 2023 (O'Reilly Media) - Discusses the unique MLOps challenges of Large Language Models, covering monitoring, evaluation, and iteration, offering practical insights relevant to agentic systems.