Let's put the rules of thumb we've discussed into practice. Remember, these are estimates primarily focused on the memory needed to hold the model weights. Actual usage will be higher due to activations, operating system overhead, and the specific software you use.
Our main estimation tool will be the relationship:
RequiredVRAM≈ParameterCount×BytesPerParameter
We'll focus on FP16 (16-bit floating point) precision, which is very common for inference. In FP16, each parameter requires 2 bytes of storage.
Consider a model often referred to as a "7B" model, meaning it has approximately 7 billion parameters.
To convert bytes to gigabytes (GB), we divide by 10243 (or roughly by one billion for a quick estimate).
14,000,000,000bytes÷(1024×1024×1024)≈13.04GB
Result: A 7B parameter model running at FP16 precision requires roughly 13-14 GB of VRAM just to store the model weights.
Now let's look at a slightly larger "13B" model.
Converting to GB:
26,000,000,000bytes÷(10243)≈24.21GB
Result: A 13B parameter model at FP16 needs approximately 24-25 GB of VRAM for its weights. This already exceeds the VRAM available on many consumer GPUs.
Let's revisit the 7B model but consider using INT8 (8-bit integer) precision through quantization. In INT8, each parameter requires only 1 byte.
Converting to GB:
7,000,000,000bytes÷(10243)≈6.52GB
Result: By using INT8 quantization, the VRAM requirement for the 7B model's weights drops to roughly 6.5-7 GB. This makes it feasible to run on GPUs with less VRAM, like those with 8 GB or 12 GB, although you still need headroom for activations and overhead.
Estimated VRAM required just for the weights of a 7 billion parameter model using FP16 (16-bit floating point) and INT8 (8-bit integer) precision.
Remember, these calculations provide a baseline for the model weights only. You need additional VRAM for:
A safer rule of thumb is often to add a buffer of 20-40% on top of the weight VRAM estimate, or simply ensure your GPU has significantly more VRAM than the calculated weight requirement. For instance, to comfortably run the 7B FP16 model (estimated 13-14 GB for weights), a GPU with 16 GB or ideally 24 GB of VRAM would be preferable, especially for longer contexts.
Compare these estimates to the specifications of your hardware (as discussed in the "Checking Hardware Specifications" section) to gauge whether running a specific model is feasible on your system.
© 2025 ApX Machine Learning