Monitoring Training Metrics (Loss, Grad Norm)

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides core theory on neural network training, including optimization, loss functions, and issues like exploding/vanishing gradients.
CS224n: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2023 (Stanford University) - Course materials for deep learning in NLP, covering training practices, optimization, and monitoring for language models.
Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei, 2020 arXiv preprint arXiv:2001.08361 DOI: 10.48550/arXiv.2001.08361 - Research paper analyzing the behavior of training loss with model and dataset scale in large language models, setting expectations for performance.