Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A standard textbook that establishes mathematical notation for linear algebra, calculus, and probability within the context of machine learning and deep learning.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and defines much of the specific notation used for its components, such as Q, K, V matrices and the self-attention mechanism.
PyTorch Tensors, PyTorch Authors, 2024 - Official documentation explaining PyTorch tensors, their creation, shapes, and basic operations, directly relevant to understanding the code mapping of mathematical notation.
CS224n: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2025 (Stanford University) - A leading university course that consistently applies standard notation for deep learning, especially within the context of natural language processing and transformer models.