Efficient LLMs: Distillation and Quantization

New · Open Source

Kerb - LLM Development Toolkit

Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.

Was this section helpful?

References

Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2015 arXiv preprint arXiv:1503.02531 DOI: 10.48550/arXiv.1503.02531 - This foundational paper introduces knowledge distillation, covering softened probabilities and temperature in softmax.
PyTorch Quantization Documentation, PyTorch Contributors, 2024 (PyTorch Foundation) - Official documentation for PyTorch's quantization module, offering practical details and implementation guidance for PTQ and QAT.