Amazon S3 Storage Classes, Amazon Web Services (AWS), 2024 (Amazon Web Services) - Provides detailed information on various Amazon S3 storage classes and lifecycle management policies, important for cost-effective data tiering and archiving of raw and processed text data.
Engineering MLOps: From Model to Production, Emmanuel Raj, Larysa Visengeriyeva, Arpit Shah, 2022 (O'Reilly Media) - Covers best practices for building and managing production machine learning systems, including efficient data ingestion pipelines, data versioning, and cost-aware design relevant to RAG system backbone.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Nils Reimers, Iryna Gurevych, 2019Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)DOI: 10.48550/arXiv.1908.10084 - Introduces the Sentence-BERT framework, which is the foundation for the sentence-transformers library, offering insights into efficient embedding generation, model selection, and batch processing for RAG systems.