LangChain Evaluation Overview, LangChain, 2024 - The official guide to LangChain's evaluation module, covering evaluators, datasets, and integration with LangSmith for building automated pipelines.
MLOps: Continuous delivery and automation pipelines in machine learning, Clemens Mewald and Evgeni Begel, 2021Google Cloud Architecture Center (Google Cloud) - This resource covers the broader context of automating ML model lifecycles, including continuous evaluation, which is fundamental to automated evaluation pipelines.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica, 2023NeurIPS 2023 Datasets and Benchmarks TrackDOI: 10.48550/arXiv.2306.05685 - This paper introduces the methodology of using LLMs as evaluators for other LLMs, a type of evaluator discussed in the section.
Data Validation for Machine Learning: A Practical Guide, Robert Monarch, 2021 (O'Reilly Media) - This book addresses the importance of data quality and validation for machine learning models, impacting the 'Evaluation Dataset' component and the 'Dataset Maintenance' challenge.