Thermal Physics, Charles Kittel, Herbert Kroemer, 1980 (W. H. Freeman and Company) - A classic textbook that introduces the fundamental concepts of statistical mechanics, including the Boltzmann distribution which provides the theoretical basis for temperature-scaled probability distributions.
Language Models are Unsupervised Multitask Learners, Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever, 2019OpenAI Blog / Technical Report - This foundational paper on GPT-2 discusses temperature as a decoding parameter for controlling the randomness and diversity of generated text, illustrating the inspiration for its use in data sampling strategies.
torch.nn.functional.softmax, PyTorch Core Team, 2024 (PyTorch) - Official documentation for the PyTorch softmax function, essential for understanding and implementing the core calculation of temperature-based sampling probabilities in a deep learning framework.