Saving and Loading Models, Matthew Inkawhich, 2024 (PyTorch Foundation) - Provides official guidelines and examples for checkpointing models, optimizers, and schedulers in PyTorch.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer and details its adaptive learning rate mechanism, highlighting the internal moment estimates that necessitate saving its state.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational text for deep learning that discusses training strategies, optimization algorithms, and the importance of hyperparameter tuning and reproducibility.