QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Introduces QLoRA, a key technique combining 4-bit quantization and PEFT (LoRA) for efficient LLM fine-tuning, directly addressing the PEFT and Quantization combination.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf, 20195th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019DOI: 10.48550/arXiv.1910.01108 - Presents DistilBERT, a prominent example of using knowledge distillation to create a smaller, more efficient version of a large language model with maintained performance, illustrating distillation's role in compression.