Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, 2020arXiv preprint arXiv:2001.08361DOI: 10.48550/arXiv.2001.08361 - Presents the empirical scaling laws for language models relating performance to model size, dataset size, and compute.
Emergent Abilities of Large Language Models, Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, David Dohan, Sharan Narang, Aakanksha Chowdhery, Dennis Anil, Aitor Lewkowycz, Erica Grant, Adam Roberts, Kevin Robinson, Brennan Saeta, Hyung Won Chung, Azalia Mirhoseini, Charles Sutton, Siva Reddy, P. J. Liu, William Fedus, Xiangru Tang, Michele Catasta, Xavier Garcia, Dan Garrette, Kevin Lacker, Srinivas Ramabhadran, Peter J. Liu, Adam Roberts, Jonathon Shlens, Noam Shazeer, Maithra Raghu, Jordan Hoffmann, Henryk Michalewski, Jeffrey Dean, 2022Transactions on Machine Learning Research, Vol. 1 (JMLR.org)DOI: 10.48550/arXiv.2206.07682 - Defines and illustrates emergent abilities in large language models that arise qualitatively at scale.
Training Compute-Optimal Large Language Models, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre, 2022arXiv preprint arXiv:2203.15556DOI: 10.48550/arXiv.2203.15556 - Investigates the optimal balance between model size and training data for a given compute budget.
Language Models are Few-Shot Learners, Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing Systems, Vol. 33 (NeurIPS) - Demonstrates the capability of large language models for few-shot learning and instruction following through scaling.