Mixed-Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Bruce Ginsburg, Boris Ginsburg, Andrew H. Lastra, Andrew Levenberg, Hao Nguyen, Oleksandr Patmochnyk, Ganesh Seetharaman, D. Shane Snyder, Gregory F. Tang, Valerie Tarashchansky, Galen Wasserman, Barry Whaley, Pieter van der Wijngaart, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - Introduced mixed-precision training using FP16 and FP32, outlining techniques like loss scaling and using higher precision for sensitive operations and accumulators.
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association) - Describes TVM, an end-to-end optimizing compiler that supports mixed-precision and quantization through its intermediate representation and scheduling, relevant for compiler strategies.