XLA: Accelerated Linear Algebra, Google, 2024 (Google Developers) - Official documentation for XLA, detailing its architecture, optimizations like operator fusion, and how it accelerates computations for machine learning.
JAX Documentation, Google, 2024 - The official JAX documentation, explaining JAX's compilation process and its integration with XLA for performance, including aspects of operator optimization.
TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 2018Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) (USENIX Association) - This academic paper describes general deep learning compiler optimizations, including operator fusion, within the TVM compiler framework, providing a strong understanding of these techniques.
NVIDIA CUDA C++ Programming Guide, NVIDIA, 2023 (NVIDIA) - The comprehensive guide to NVIDIA CUDA programming, explaining GPU architecture, memory models, and performance considerations that underpin the benefits of optimizations like operator fusion.