DeepSpeed: System Optimizations for Large-Scale Model Training, Jie Ren, Hao Li, Samyam Rajbhandari, Conglong Li, Di He, Zhicheng Cui, Xuanli Chen, Junchao Li, Sholto Scruton, Minjia Zhang, 2021ACM SIGOPS Operating Systems Review, Vol. 55 (ACM)DOI: 10.1145/3452044.3483742 - Describes DeepSpeed, a comprehensive framework providing optimized distributed training capabilities, including various forms of model parallelism that complement tensor parallelism.
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE Computer Society)DOI: 10.1109/SC45903.2020.00078 - While focusing on memory optimization for optimizer states, gradients, and parameters, ZeRO is crucial for enabling the training of models large enough to necessitate tensor parallelism.