Challenges with FP16 Training (Range Issues)

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Mixed-Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018 International Conference on Learning Representations (ICLR) DOI: 10.48550/arXiv.1710.03740 - Introduces mixed-precision training, detailing FP16 range issues (underflow/overflow) and the proposed solution of loss scaling.
A New Standard for Mixed-Precision Training: bfloat16, Karen Young, David Patterson, Cliff Young, 2019 (Google AI Blog) - Explains the BFloat16 format, highlighting its wider dynamic range compared to FP16, which directly addresses underflow and overflow concerns in mixed-precision training.
Computer Architecture: A Quantitative Approach, John L. Hennessy, David A. Patterson, 2017 (Morgan Kaufmann) - A fundamental textbook providing a complete explanation of floating-point arithmetic, including the IEEE 754 standard, bit allocations, and the numerical properties of various formats like FP16 and FP32.
Automatic Mixed Precision for Deep Learning, PyTorch Documentation, 2024 - Official PyTorch guide explaining mixed-precision training, including how framework-level solutions like torch.cuda.amp address the numerical stability challenges posed by FP16.