Having explored the theoretical steps involved in building and training neural networks, let's put this knowledge into practice. We will construct, train, and evaluate a simple feedforward neural network to classify handwritten digits from the well-known MNIST dataset. This exercise synthesizes the concepts covered in this chapter, including data preparation, model definition using a framework, selecting a loss function and optimizer, the training loop, and performance evaluation.
We will use PyTorch, a popular deep learning library, for this task.
First, ensure you have PyTorch and Torchvision installed. If not, you can typically install them using pip or conda. Then, import the necessary modules:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
# Check if GPU is available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels. 60,000 are for training and 10,000 for testing. torchvision
provides convenient access to this dataset. We'll apply transformations to convert the images to PyTorch tensors and normalize their pixel values. Normalization helps stabilize training; we use the standard mean (0.1307) and standard deviation (0.3081) for MNIST.
# Transformations to apply to the data
transform = transforms.Compose([
transforms.ToTensor(), # Convert image to PyTorch Tensor
transforms.Normalize((0.1307,), (0.3081,)) # Normalize pixel values
])
# Download and load the training data
trainset = torchvision.datasets.MNIST(root='./data', train=True,
download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64,
shuffle=True, num_workers=2)
# Download and load the test data
testset = torchvision.datasets.MNIST(root='./data', train=False,
download=True, transform=transform)
testloader = DataLoader(testset, batch_size=1000,
shuffle=False, num_workers=2)
The DataLoader
wraps the dataset, providing an iterator for easy batching, shuffling, and parallel data loading.
We'll define a simple feedforward neural network (Multi-Layer Perceptron) with two hidden layers using the ReLU activation function. The input layer takes flattened 28x28 images (784 features), and the output layer has 10 neurons (one for each digit class) with no activation applied here, as we will use CrossEntropyLoss
which internally applies LogSoftmax.
class SimpleMLP(nn.Module):
def __init__(self):
super(SimpleMLP, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(28*28, 128) # Input layer -> Hidden layer 1
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(128, 64) # Hidden layer 1 -> Hidden layer 2
self.relu2 = nn.ReLU()
self.fc3 = nn.Linear(64, 10) # Hidden layer 2 -> Output layer
def forward(self, x):
x = self.flatten(x) # Flatten the image
x = self.fc1(x)
x = self.relu1(x)
x = self.fc2(x)
x = self.relu2(x)
x = self.fc3(x) # Raw scores (logits)
return x
# Instantiate the model and move it to the appropriate device (GPU or CPU)
model = SimpleMLP().to(device)
print(model)
For multi-class classification, CrossEntropyLoss
is a standard choice. It combines LogSoftmax
and NLLLoss
(Negative Log Likelihood Loss) in one class. We'll use the Adam optimizer, a popular and effective choice for many problems.
# Define the loss function
criterion = nn.CrossEntropyLoss()
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001) # Learning rate = 0.001
The training process involves iterating over the dataset multiple times (epochs). In each epoch, we iterate through the data in batches. For each batch:
optimizer.zero_grad()
).outputs = model(inputs)
).loss = criterion(outputs, labels)
).loss.backward()
).optimizer.step()
).We'll train for a few epochs and print the loss periodically.
num_epochs = 5 # Number of times to iterate over the training dataset
training_losses = [] # To store loss values for plotting
print("Starting Training...")
for epoch in range(num_epochs):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# Get the inputs; data is a list of [inputs, labels]
inputs, labels = data[0].to(device), data[1].to(device)
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backward pass and optimize
loss.backward()
optimizer.step()
# Print statistics
running_loss += loss.item()
if (i + 1) % 200 == 0: # Print every 200 mini-batches
avg_loss = running_loss / 200
print(f'Epoch [{epoch + 1}/{num_epochs}], Batch [{i + 1}/{len(trainloader)}], Loss: {avg_loss:.4f}')
training_losses.append({"epoch": epoch + (i+1)/len(trainloader), "loss": avg_loss})
running_loss = 0.0
print('Finished Training')
Visualizing the loss during training helps understand if the model is learning effectively. A decreasing loss generally indicates learning.
Example training loss curve showing decrease over epochs. Actual values depend on the specific run.
After training, we evaluate the model's performance on the unseen test dataset. We iterate through the test data, make predictions, and compare them to the true labels to calculate accuracy. It's important to disable gradient calculations (torch.no_grad()
) during evaluation to save memory and computation, as we are not updating weights.
correct = 0
total = 0
# Since we're not training, we don't need to calculate gradients
with torch.no_grad():
model.eval() # Set model to evaluation mode
for data in testloader:
images, labels = data[0].to(device), data[1].to(device)
# Calculate outputs by running images through the network
outputs = model(images)
# The class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
model.train() # Set model back to training mode (important if using dropout/batchnorm)
accuracy = 100 * correct / total
print(f'Accuracy of the network on the 10000 test images: {accuracy:.2f} %')
A typical accuracy for this simple network on MNIST after 5 epochs might be around 96-97%. This demonstrates that our network has learned to recognize handwritten digits reasonably well.
This hands-on example walked through the essential steps of building and training a basic neural network using PyTorch. You loaded data, defined a model architecture, chose loss and optimization strategies, executed the training loop, and evaluated the result. This forms the foundation for tackling more complex deep learning problems. In subsequent chapters, we will explore techniques to improve performance and handle more complex data types.
© 2025 ApX Machine Learning