Training large language models from scratch is resource-intensive. Adapting these pre-trained models to specific downstream tasks often involves fine-tuning all parameters, which is also computationally expensive and leads to large storage requirements if managing multiple task-specific models.
Parameter-Efficient Fine-Tuning (PEFT) techniques provide methods to adapt LLMs by modifying only a small subset of their parameters, or by adding a small number of new parameters. This approach significantly reduces the computational cost and memory footprint associated with fine-tuning, making it feasible to customize large models for various applications without retraining them entirely or storing numerous full-sized copies.
This chapter examines several prominent PEFT strategies. We will look into the mechanics of Adapter modules, various prompt-based tuning methods like Prefix Tuning and Prompt Tuning, and the widely used Low-Rank Adaptation (LoRA). We will also cover Quantized LoRA (QLoRA), which further reduces memory usage by integrating quantization. You will learn how these methods work, analyze their performance trade-offs, and gain practical experience implementing them.
5.1 Motivation for PEFT
5.2 Adapter Modules
5.3 Prefix Tuning, Prompt Tuning, and P-Tuning
5.4 Low-Rank Adaptation (LoRA)
5.5 Quantized LoRA (QLoRA)
5.6 Combining PEFT Methods
5.7 Performance Analysis of PEFT Techniques
5.8 Practice: Fine-tuning with LoRA and QLoRA
© 2025 ApX Machine Learning