GLUE: A Multi-Task Benchmark for Natural Language Understanding, Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1804.07461 - Introduces the General Language Understanding Evaluation (GLUE) benchmark, a collection of diverse NLP tasks for evaluating general-purpose language understanding models on downstream applications.
Speech and Language Processing (3rd Edition Draft), Daniel Jurafsky and James H. Martin, 2025 - A widely recognized textbook providing comprehensive coverage of NLP tasks, evaluation metrics, and foundational models, highly relevant for understanding downstream applications and their assessment.