QLoRA: Efficient Finetuning of Quantized LLMs on Consumer GPUs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Introduces QLoRA and the NormalFloat (NF4) data type, a significant method for 4-bit quantization in large language models, enabling efficient fine-tuning on consumer hardware.
Quantization for Model Optimization, PyTorch Documentation, 2019 - Official PyTorch documentation providing practical guidance and APIs for implementing post-training quantization and quantization-aware training.