ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE)DOI: 10.1109/SC44002.2020.9251842 - Introduces the ZeRO (Zero Redundancy Optimizer) family of techniques, a foundational contribution of DeepSpeed for massively reducing memory footprint in large model training by partitioning optimizer states, gradients, and parameters.
DeepSpeed Documentation, Microsoft DeepSpeed Team, 2024 (Microsoft) - The official and comprehensive documentation for DeepSpeed, providing detailed guides on installation, configuration, API usage, and practical examples for all its features, including ZeRO, memory offloading, and pipeline parallelism.