Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, Adam Tauman Kalai, 2016Advances in Neural Information Processing Systems, Vol. 29 (Curran Associates, Inc.)DOI: 10.55917/cbna.2016.92 - Introduces the Word Embedding Association Test (WEAT) for quantifying and mitigating social biases, particularly gender stereotypes, encoded in word embeddings, a foundational work for intrinsic bias assessment.
Challenges and Approaches for Mitigating Bias and Harm in Large Language Models, Laura Weidinger, John Mellor, Maribeth Smyth, Tom Mellor, Dinah Gloor, Laura Hughes, Leslie Garcia-Amaya, Matthew N. Rahtz, Jonathan F. Simon, Hannah Sheahan, Mario Lucic, Peter S. Park, Javier Snape, Manu Saraswat, M. F. W. Ver Steeg, Geoffrey Irving, Iason Gabriel, 2021Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35 (AAAI Press)DOI: 10.1609/aaai.v35i17.17709 - Provides a comprehensive overview of the challenges of bias and harm in large language models and discusses various mitigation approaches and assessment techniques.
Fairness in Machine Learning: A Survey, Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, Aram Galstyan, 2021ACM Computing Surveys (CSUR), Vol. 54 (Association for Computing Machinery (ACM))DOI: 10.1145/3457607 - Offers a broad survey of fairness definitions, bias types, and mitigation techniques in machine learning, providing a foundational understanding relevant to LLMs.