BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation, Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, Rahul Gupta, 2021Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery)DOI: 10.1145/3442188.3445924 - Introduces a dataset and metrics for evaluating fairness and biases in open-ended language generation models, directly relevant to the evaluation strategies mentioned.