CUDA C++ Programming Guide, NVIDIA Corporation, 2023 (NVIDIA Corporation) - Provides detailed explanations of the CUDA programming model, GPU architecture, memory hierarchy, and performance optimization techniques for NVIDIA GPUs.
ROCm Programming Guide, AMD, 2023 (AMD) - Explains the ROCm platform, HIP programming model, AMD GPU hardware concepts, and guidelines for developing high-performance applications on AMD GPUs.
Programming Massively Parallel Processors: A Hands-on Approach, David B. Kirk, Wen-mei W. Hwu, 2016 (Morgan Kaufmann) - A foundational textbook covering GPU architecture, the CUDA programming model, parallel programming patterns, and optimization strategies.
TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association)DOI: 10.5555/3342192.3342240 - Introduces TVM, a compiler stack for deep learning that automatically optimizes and generates code for diverse hardware, including GPUs, through a TVM IR and scheduling.
LLVM Language Reference Manual - NVPTX Target, The LLVM Project, 2023 - Detailed reference on the LLVM intermediate representation (IR) specific to the NVIDIA PTX backend, showing how LLVM translates to PTX assembly for NVIDIA GPUs.