Having established the connection between LLM size and the necessary hardware components, this chapter provides methods for estimating those requirements. You will learn how to approximate the amount of memory, particularly Video RAM (VRAM), needed to run a given LLM.

We will introduce a common rule of thumb relating model parameters to VRAM usage, considering data types like $FP16$ . A simplified calculation is often used: $Required \, VRAM \approx Parameter \, Count \times Bytes \, Per \, Parameter$ Beyond this initial estimate, we will discuss other factors influencing memory needs, such as memory used for activations during processing, context length, and batch size impacts. You will also learn how to check your own system's hardware specifications and apply these estimation techniques through practice examples. This chapter equips you with practical tools for assessing hardware needs before running large language models.

Sections

5.1 Rule of Thumb: Parameters to VRAM
5.2 Accounting for Activation Memory
5.3 Factors Influencing Actual Usage
5.4 Checking Hardware Specifications
5.5 Practice: Simple VRAM Estimations