Mixed-Precision Training for Deep Neural Networks, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR) 2018DOI: 10.48550/arXiv.1710.03740 - Explains the benefits of mixed-precision training (FP16/FP32) for reducing memory footprint and improving training speed in deep neural networks.
NVIDIA A100 Tensor Core GPU Architecture, NVIDIA Corporation, 2020 (NVIDIA Corporation) - Provides detailed insights into the A100 GPU architecture, including the role and specifications of HBM2e in supporting high-performance AI workloads.