NVIDIA Megatron-LM GitHub Repository, NVIDIA, 2024 - Official source code and practical examples for implementing model parallelism with Megatron-LM.
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM)DOI: 10.1145/3416909.3417006 - Introduces ZeRO, a memory optimization strategy often integrated with Megatron-LM to enable more efficient data-parallel training for extremely large models.