API Pricing, OpenAI, 2024 (OpenAI) - Provides official details on Large Language Model (LLM) tokenization, pricing models, and how token usage impacts operational costs.
Retrieval-Augmented Generation for Large Language Models: A Survey, Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang, 2023arXiv preprint arXiv:2312.10997DOI: 10.48550/arXiv.2312.10997 - Provides a comprehensive overview of RAG systems, discussing various context optimization techniques such as re-ranking and summarization to improve efficiency and reduce token usage.
Prompt Engineering Guide, Google Cloud, 2025 (Google) - Offers practical strategies for crafting effective prompts, including methods for conciseness and clear instructions that help optimize LLM input and output token counts.
tiktoken (Official GitHub Repository), OpenAI, 2024 - Official GitHub repository and documentation for tiktoken, OpenAI's fast byte-pair encoding (BPE) tokenizer, used for accurate token counting and cost estimation for OpenAI models.