All Courses

Getting Started with Local LLMs

Chapter 1: Introduction to Large Language Models

What Is a Large Language Model (LLM)?

A Simple View of How LLMs Work

Understanding Tokens and Text Generation

Why Run LLMs Locally?

Local vs. Cloud-Based LLMs

Quiz for Chapter 1

Chapter 2: Preparing Your Local Environment

Hardware Considerations: CPU

Hardware Considerations: RAM

Hardware Considerations: GPU and VRAM

Checking Your System Specifications

Operating System Compatibility

Installing Python (Optional but Recommended)

Introduction to the Command Line / Terminal

Quiz for Chapter 2

Chapter 3: Finding and Selecting Local LLMs

Where to Find LLM Models: Hugging Face Hub

Understanding Model Sizes and Parameters

Model Formats: GGUF and Others

Quantization: Making Models Smaller

Reading Model Cards for Information

Model Licenses and Usage Restrictions

Choosing Your First Model

Quiz for Chapter 3

Chapter 4: Running Your First Local LLM

Introduction to Local LLM Runners

Setting up Ollama

Downloading a Model with Ollama

Running a Model with Ollama (Command Line)

Setting up LM Studio

Finding and Downloading Models in LM Studio

Loading and Chatting with a Model in LM Studio

Introduction to llama.cpp (Concept)

Hands-on Practical: Running a Model

Quiz for Chapter 4

Chapter 5: Basic Interaction and Prompting

What is a Prompt?

Your First Prompt: Simple Questions

Giving Instructions

Understanding Context Window

Basic Prompt Formatting Tips

Temperature and Creativity

Common Interaction Patterns

Practice: Simple Prompting Techniques

Quiz for Chapter 5

Quantization: Making Models Smaller

Was this section helpful?

References

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh, 2022 ICLR 2023 DOI: 10.48550/arXiv.2210.17323 - Details GPTQ, a method for accurate post-training 4-bit quantization specifically designed for large language models, addressing precision trade-offs.
llama.cpp repository, Georgi Gerganov and the llama.cpp Community Contributors, 2024 - The project's repository and associated documentation, detailing the GGUF file format and its specific quantization schemes (e.g., Q_K variants) used for local LLM inference.

© 2025 ApX Machine LearningEngineered with