MLIR: A Compiler Infrastructure for the End of Moore's Law, Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, Oleksandr Zinenko, 2020Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI '20) (USENIX Association)DOI: 10.5555/3471017.3471046 - Describes MLIR's design principles, explaining the advantages of multi-level IR and extensible type systems for domain-specific optimizations, including quantization.
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2018.00287 - A foundational paper introducing the affine quantization scheme with scale and zero-point for neural networks, including the mathematical underpinnings for quantize, dequantize, and requantize operations.
MLIR Quantization Dialect Guide, LLVM Project Developers, 2024 (LLVM Foundation) - Official documentation detailing the MLIR Quantization Dialect, providing specific examples of dedicated quantized types and operations for explicit IR representation.
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 2018Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) (USENIX Association)DOI: 10.5555/3342335.3342371 - Introduces TVM, a deep learning compiler framework with an IR designed to facilitate various optimizations including low-precision and quantization, providing context for compiler design.