Polyhedral Optimizations for GPGPUs, Uday Bondhugula, Albert Hartono, Jagannath Kannan, R. M. Ramanujam, Jay Hoeflinger, and Paul H. J. Kelly, 2012ACM Transactions on Architecture and Code Optimization (TACO), Vol. 8 (ACM)DOI: 10.1145/2132896.2132902 - Explores the application of the polyhedral model to optimize loop nests, particularly for GPGPU architectures, demonstrating transformations like tiling and fusion for improved performance in compute-intensive kernels.
Compilers for Deep Learning, Christophe Dubach, Oleksiy Telyatnikov, and Hugh Leather, 2020Synthesis Lectures on Computer Architecture (Morgan & Claypool Publishers)DOI: 10.2200/S01021ED1V01Y202006CAV016 - Provides an overview of the compiler techniques and challenges specific to deep learning workloads, covering tensor operation lowering, graph optimizations, and hardware-specific code generation.
BLIS: A Framework for Rapidly Instantiating High-Performance BLAS Operations, Field G. Van Zee and Robert A. van de Geijn, 2015ACM Transactions on Mathematical Software (TOMS), Vol. 41 (ACM)DOI: 10.1145/2765131 - Presents BLIS, a framework for systematically developing high-performance implementations of BLAS operations, particularly matrix multiplication, by structuring optimizations for memory hierarchy and parallelization.
TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association)DOI: 10.5555/3304128.3304170 - Introduces TVM, a compiler stack for deep learning that automatically optimizes tensor computations, addressing challenges like loop nest transformation, memory management, and data layout for diverse hardware backends.