asyncio - Asynchronous I/O, Python Software Foundation, 2024 (Python Software Foundation) - Essential reference for understanding asynchronous programming in Python, especially the async/await patterns relevant to optimizing RAG pipeline I/O operations.
vLLM: Efficient LLM Serving with PagedAttention, Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica, 2023SOSP 2023DOI: 10.48550/arXiv.2309.06180 - Presents a high-throughput and low-latency serving system for large language models, detailing architectural choices and optimizations that include managing GPU memory and computation efficiently, which directly benefits from intelligent batching.
Distributed Systems: Concepts and Design, George Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair, 2011 (Addison-Wesley) - A foundational textbook covering the principles of distributed computing, including concurrency, communication, and resource management, which are fundamental for designing scalable asynchronous and batched systems.
FastAPI Documentation, Sebastián Ramírez, 2024 - Official documentation for a popular Python web framework that inherently supports asynchronous request handling via async/await, valuable for building responsive RAG API endpoints.