Distributed JAX, The JAX authors, 2024 (The JAX authors) - This official documentation details JAX's approach to distributed computation, covering global device management, pmap across hosts, and collective operations.
Cloud TPU system architecture, Google Cloud Documentation, 2024 (Google Cloud) - Provides a technical description of Google Cloud TPUs, including their high-speed interconnects and how they support distributed training efficiently.
Using MPI: Portable Parallel Programming with the Message-Passing Interface, William Gropp, Ewing Lusk, and Anthony Skjellum, 1994 (The MIT Press) - A widely recognized book explaining the Message Passing Interface, whose collective communication concepts are fundamental to efficient data exchange in multi-host JAX programs.