As established, modifying every parameter in a massive language model during fine-tuning presents significant computational hurdles. Parameter-Efficient Fine-Tuning (PEFT) methods offer a suite of techniques designed to overcome these limitations by drastically reducing the number of trainable parameters, often by orders of magnitude, while preserving or closely matching the performance of full fine-tuning.
Instead of a single monolithic approach, PEFT encompasses several distinct strategies. Understanding these strategies helps in selecting the appropriate technique for a given task, model, and resource constraints. We can broadly classify most PEFT methods into three main families: Additive, Selective, and Reparameterization methods.
Additive methods operate on a simple principle: keep the original pre-trained model weights frozen and introduce a small number of new trainable parameters. These new parameters are integrated into the model's architecture in specific ways to influence its behavior during fine-tuning.
The primary advantage of additive methods is the clear separation between the pre-trained knowledge (frozen weights) and the task-specific adaptation (new parameters). This modularity simplifies multi-task learning and deployment, as different adapters or prefixes can be loaded on demand without altering the large base model.
Selective methods take a more direct approach by unfreezing and fine-tuning only a small, carefully chosen subset of the original pre-trained model parameters. The rest of the parameters remain frozen.
Examples include:
While straightforward, the effectiveness of selective methods heavily depends on identifying the correct subset of parameters to tune. This identification can be non-trivial. While potentially less memory-intensive during training than full fine-tuning, they might require tuning more parameters than additive or reparameterization methods to achieve comparable performance. Furthermore, managing different task adaptations requires storing separate copies of the modified parameters or applying complex patching mechanisms.
Reparameterization methods modify how weight updates are represented or applied, rather than directly adding parameters or selecting subsets. The most prominent technique in this category leverages low-rank approximations.
LoRA offers a compelling balance between parameter efficiency and performance. The low-rank update matrices B and A are compact. Importantly, once training is complete, the update BA can be merged back into the original weights (W=W0+BA), eliminating any inference latency overhead compared to the original model. This property is particularly attractive for deployment scenarios.
The following diagram illustrates the differences between these PEFT families:
Overview of PEFT method families. Additive methods introduce new modules. Selective methods tune parts of the original model. Reparameterization methods (like LoRA) use low-rank updates (BA) applied to frozen weights (W).
Understanding this taxonomy provides a framework for navigating the diverse landscape of PEFT. The following chapters will provide deeper technical analysis and practical implementation details for key methods, starting with an in-depth examination of LoRA.
© 2025 ApX Machine Learning