GLUE: A Multi-Task Benchmark for Natural Language Understanding, Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman, 2018International Conference on Learning Representations (ICLR) 2019 (published 2018)DOI: 10.48550/arXiv.1804.07461 - The original paper introducing the General Language Understanding Evaluation (GLUE) benchmark, detailing its tasks and methodology.
Fine-tuning a pretrained model, Hugging Face, 2024 (Hugging Face) - A chapter from the Hugging Face NLP Course that explains the practical process of fine-tuning pre-trained transformer models for specific NLP tasks, including code examples relevant to GLUE/SuperGLUE evaluation.