All Courses

Hands-on Practical: Training a Classifier on MNIST

Having explored the theoretical steps involved in building and training neural networks, let's put this knowledge into practice. We will construct, train, and evaluate a simple feedforward neural network to classify handwritten digits from the well-known MNIST dataset. This exercise synthesizes the concepts covered in this chapter, including data preparation, model definition using a framework, selecting a loss function and optimizer, the training loop, and performance evaluation.

We will use PyTorch, a popular deep learning library, for this task.

Setting Up the Environment

First, ensure you have PyTorch and Torchvision installed. If not, you can typically install them using pip or conda. Then, import the necessary modules:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Check if GPU is available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Loading and Preparing the MNIST Dataset

The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels. 60,000 are for training and 10,000 for testing. torchvision provides convenient access to this dataset. We'll apply transformations to convert the images to PyTorch tensors and normalize their pixel values. Normalization helps stabilize training; we use the standard mean (0.1307) and standard deviation (0.3081) for MNIST.

# Transformations to apply to the data
transform = transforms.Compose([
    transforms.ToTensor(), # Convert image to PyTorch Tensor
    transforms.Normalize((0.1307,), (0.3081,)) # Normalize pixel values
])

# Download and load the training data
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

# Download and load the test data
testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = DataLoader(testset, batch_size=1000,
                                         shuffle=False, num_workers=2)

The DataLoader wraps the dataset, providing an iterator for easy batching, shuffling, and parallel data loading.

Defining the Neural Network Model

We'll define a simple feedforward neural network (Multi-Layer Perceptron) with two hidden layers using the ReLU activation function. The input layer takes flattened 28x28 images (784 features), and the output layer has 10 neurons (one for each digit class) with no activation applied here, as we will use CrossEntropyLoss which internally applies LogSoftmax.

class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28*28, 128) # Input layer -> Hidden layer 1
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(128, 64)   # Hidden layer 1 -> Hidden layer 2
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(64, 10)    # Hidden layer 2 -> Output layer

    def forward(self, x):
        x = self.flatten(x)    # Flatten the image
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)        # Raw scores (logits)
        return x

# Instantiate the model and move it to the appropriate device (GPU or CPU)
model = SimpleMLP().to(device)
print(model)

Compiling the Model: Loss Function and Optimizer

For multi-class classification, CrossEntropyLoss is a standard choice. It combines LogSoftmax and NLLLoss (Negative Log Likelihood Loss) in one class. We'll use the Adam optimizer, a popular and effective choice for many problems.

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001) # Learning rate = 0.001

Training the Model

The training process involves iterating over the dataset multiple times (epochs). In each epoch, we iterate through the data in batches. For each batch:

Move data and labels to the designated device (GPU/CPU).
Zero the gradients accumulated from the previous batch (optimizer.zero_grad()).
Perform a forward pass: Feed the input batch to the model to get predictions (outputs = model(inputs)).
Calculate the loss between the predictions and the true labels (loss = criterion(outputs, labels)).
Perform a backward pass: Compute the gradients of the loss with respect to model parameters (loss.backward()).
Update the model parameters using the computed gradients (optimizer.step()).

We'll train for a few epochs and print the loss periodically.

num_epochs = 5 # Number of times to iterate over the training dataset
training_losses = [] # To store loss values for plotting

print("Starting Training...")
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # Get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

        # Print statistics
        running_loss += loss.item()
        if (i + 1) % 200 == 0: # Print every 200 mini-batches
            avg_loss = running_loss / 200
            print(f'Epoch [{epoch + 1}/{num_epochs}], Batch [{i + 1}/{len(trainloader)}], Loss: {avg_loss:.4f}')
            training_losses.append({"epoch": epoch + (i+1)/len(trainloader), "loss": avg_loss})
            running_loss = 0.0

print('Finished Training')

Monitoring Training Progress

Visualizing the loss during training helps understand if the model is learning effectively. A decreasing loss generally indicates learning.

Example training loss curve showing decrease over epochs. Actual values depend on the specific run.

Evaluating Model Performance

After training, we evaluate the model's performance on the unseen test dataset. We iterate through the test data, make predictions, and compare them to the true labels to calculate accuracy. It's important to disable gradient calculations (torch.no_grad()) during evaluation to save memory and computation, as we are not updating weights.

correct = 0
total = 0
# Since we're not training, we don't need to calculate gradients
with torch.no_grad():
    model.eval() # Set model to evaluation mode
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        # Calculate outputs by running images through the network
        outputs = model(images)
        # The class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    model.train() # Set model back to training mode (important if using dropout/batchnorm)


accuracy = 100 * correct / total
print(f'Accuracy of the network on the 10000 test images: {accuracy:.2f} %')

A typical accuracy for this simple network on MNIST after 5 epochs might be around 96-97%. This demonstrates that our network has learned to recognize handwritten digits reasonably well.

This hands-on example walked through the essential steps of building and training a basic neural network using PyTorch. You loaded data, defined a model architecture, chose loss and optimization strategies, executed the training loop, and evaluated the result. This forms the foundation for tackling more complex deep learning problems. In subsequent chapters, we will explore techniques to improve performance and handle more complex data types.

Was this section helpful?