ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Cong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM)DOI: 10.1145/3418579.3441319 - Describes the ZeRO concept and its three stages, explaining the memory-saving mechanisms.
DeepSpeed ZeRO-powered Data Parallelism, DeepSpeed Team, 2024 - Official resource for configuring and using ZeRO optimizations within the DeepSpeed framework.
DeepSpeed: Extreme-Scale Model Training for Everyone, Jeff Rasley, Samyam Rajbhandari, Kazem Cheshmi, Chris Ping, Yuxiong He, 2020KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Association for Computing Machinery (ACM))DOI: 10.1145/3394486.3403154 - Presents the DeepSpeed framework, including ZeRO as a core component, and its capabilities for training large-scale models.