ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Association for Computing Machinery (ACM))DOI: 10.1145/3410464.3410729 - Details memory optimization techniques for large models that significantly affect communication patterns and overhead.
The Future of High-Performance Deep Learning, Tal Ben-Nun, Torsten Hoefler, 2019Journal of Machine Learning Research, Vol. 20 (Journal of Machine Learning Research) - A comprehensive review of distributed deep learning systems, covering communication primitives, architectures, and performance modeling.