Fine-tuning LLMs for RAG-Specific Generation Tasks
Was this section helpful?
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems (NeurIPS 2020)DOI: 10.48550/arXiv.2005.11401 - This foundational paper introduces the Retrieval-Augmented Generation (RAG) framework, demonstrating its ability to combine parametric and non-parametric memory for better knowledge-intensive NLP tasks. It establishes the architectural basis for the generation component discussed.
LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021International Conference on Learning Representations (ICLR 2022)DOI: 10.48550/arXiv.2106.09685 - This paper introduces LoRA, a parameter-efficient fine-tuning technique that significantly reduces the number of trainable parameters, making fine-tuning large language models more accessible and efficient.
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023Extended NeurIPS submissionDOI: 10.48550/arXiv.2305.14314 - This work presents QLoRA, an optimization of LoRA that enables fine-tuning of 4-bit quantized large language models, further reducing memory usage and making very large models fine-tunable on consumer GPUs.