Was this section helpful?
torch.distributed.fsdp
- PyTorch 2.3 documentation, PyTorch Development Team, 2022 - Official documentation for PyTorch's Fully Sharded Data Parallelism (FSDP), explaining its usage for memory-efficient training of large models by sharding model parameters, gradients, and optimizer states.