Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Neural Information Processing Systems (NeurIPS))DOI: 10.5555/3295222.3295232 - Introduces the Transformer architecture and its specific learning rate schedule including a warmup phase, foundational for LLMs.
How to adjust learning rate - PyTorch Documentation, PyTorch Developers, 2025 - Official documentation for PyTorch's learning rate schedulers, providing detailed usage and examples for various strategies.