LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - Introduces Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that reduces the number of trainable parameters for fine-tuning large language models, helping to mitigate catastrophic forgetting.
Continual Learning in Neural Networks: A Review, Giacomo Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, Stefan Wermter, 2019Neural Networks, Vol. 119 (Elsevier)DOI: 10.1016/j.neunet.2019.03.012 - A review of continual learning techniques in neural networks, including strategies for mitigating catastrophic forgetting, which is relevant to continual fine-tuning of LLMs.
Scaling Instruction-Finetuned Transformers, Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei, 2022arXiv preprintDOI: 10.48550/arXiv.2210.11416 - Examines the scaling properties of instruction fine-tuning, demonstrating that instruction-tuned models generalize better to unseen tasks and improve model alignment, a basis for continual SFT.