Fully Sharded Data Parallel (FSDP), PyTorch Contributors, 2024 (PyTorch) - Official documentation for PyTorch's native Fully Sharded Data Parallel (FSDP) implementation, detailing its use for memory-efficient large model training.
Hugging Face Accelerate Documentation, Hugging Face Inc., 2024 (Hugging Face) - Official documentation for the Hugging Face Accelerate library, providing a high-level API to run PyTorch training scripts across various distributed setups.