Training language models to follow instructions with human feedback, Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Gray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems, Vol. 35 (Neural Information Processing Systems) - Presents the method of Reinforcement Learning from Human Feedback (RLHF) for aligning language models with user preferences, a core mechanism for generator adaptation.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Akari Asai, Sandeep Subramanian, Connor Rawls, Vitor Carvalho, Yizhong Wang, Patrick Lewis, Akiko Eriguchi, Fatemeh Mirshafiee, Matthew E. Peters, Andrew Head, Nikolaus Parulian, Bradford Ong, Zeqiu Wu, Daniel Khashabi, Hannaneh Hajishirzi, 2023arXiv preprint arXiv:2310.11511DOI: 10.48550/arXiv.2310.11511 - Details a RAG framework that improves generation quality by allowing the LLM to critique its own outputs and retrieve additional documents for self-correction, aligning with system-internal feedback.