EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks, Nils Reimers, Iryna Gurevych, 2019Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics)DOI: 10.18653/v1/D19-1410 - Introduces core text augmentation methods such as synonym replacement, random insertion, random deletion, and random swap.
Self-Instruct: Aligning LLMs with Your Own Instruction Data, Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi, 2022ACL 2023DOI: 10.48550/arXiv.2212.10560 - Explores using large language models to generate new instruction-following data, demonstrating a method for LLM-based data augmentation relevant to instruction tuning.