Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2005.11401 - Introduces the foundational Retrieval-Augmented Generation (RAG) architecture, detailing its core components (retriever and generator) which are the basis for cost drivers.
Tips for working with large language models efficiently, OpenAI, 2023OpenAI Blog (OpenAI) - Provides practical advice and strategies for optimizing LLM token usage, model choice, and API interactions to reduce operational costs in production RAG systems.