The Curious Case of Neural Text Degeneration, Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi, 2019International Conference on Learning Representations (ICLR) (arXiv.org)DOI: 10.48550/arXiv.1904.09751 - This paper introduced nucleus sampling (top-p) and discusses the trade-offs between different decoding strategies, directly explaining how temperature and top-p influence output variability.
Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei, 2020Advances in Neural Information Processing Systems, Vol. 33 (NeurIPS)DOI: 10.48550/arXiv.2005.14165 - This paper showcases the strong dependency of LLM performance and output on prompt design, including the use of examples, and highlights the non-deterministic nature, making it highly relevant to input sensitivity and content variability.