GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, Yanping Huang, Youlong Cheng, Dehao Chen, Hyoukjin Kwon, Anjana Muralidharan, Animesh Singh, Vijayakrishna Prabhu, Dilip Kumar, 2019Advances in Neural Information Processing Systems, Vol. 32 (Neural Information Processing Systems Foundation)DOI: 10.5591/neurips.2019.0822 - A foundational paper introducing pipeline parallelism as a method for training very large neural networks, detailing its basic scheduling approach.
PipeDream: Fast and Efficient DNN Training Using Pipelined Model Parallelism, Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Granger, Phil Gibbons, Matei Zaharia, 2019Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP) (ACM)DOI: 10.1145/3341301.3342137 - This paper presents PipeDream, a system that significantly enhances pipeline efficiency by employing the 1F1B (one forward, one backward) scheduling strategy.
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE)DOI: 10.1109/SC40007.2020.00063 - This publication details the effective application of model parallelism, including pipeline parallelism, for training very large language models within the Megatron-LM framework.