ZeRO-Offload: Democratizing Billion-Scale Model Training, Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He, 2021arXiv preprint arXiv:2101.06840DOI: 10.48550/arXiv.2101.06840 - Describes a memory offloading technique for deep learning models, where model states are moved to CPU memory to free up GPU memory, with principles applicable to LLM inference.