Training Language Models to Follow Instructions with Human Feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022arXiv preprint arXiv:2203.02155DOI: 10.48550/arXiv.2203.02155 - This paper introduces InstructGPT, demonstrating how Reinforcement Learning from Human Feedback (RLHF) can align language models with user intent and preferences, making them more helpful and less harmful.
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList, Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh, 2020Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics)DOI: 10.18653/v1/2020.acl-main.440 - This paper proposes a methodology for evaluating NLP models beyond traditional metrics, focusing on behavioral tests and human-interpretable capabilities, relevant to understanding the limitations of automated LLM evaluation.
How to log feedback, LangChain, 2024 (LangChain) - Official documentation detailing how to programmatically log human feedback to LangSmith for tracking and analysis of LangChain application runs.