Designing Machine Learning Systems: New Era of AI Infrastructure, Chip Huyen, 2022 (O'Reilly Media) - A guide to building, deploying, and maintaining machine learning systems, covering infrastructure, MLOps practices, and system design patterns for complex AI applications like RAG.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates, Inc.)DOI: 10.48550/arXiv.2005.11401 - This foundational paper on Retrieval-Augmented Generation (RAG) discusses the components and flow of a RAG system, providing a theoretical basis for infrastructure decisions.
Google Cloud Vertex AI Documentation, Google Cloud, 2024 (Google) - Official documentation for Google Cloud's managed machine learning platform, offering insights into deploying and managing ML models, including those for RAG components like embedding services and LLM inference.