Having explored the theoretical steps involved in building and training neural networks, let's put this knowledge into practice. We will construct, train, and evaluate a simple feedforward neural network to classify handwritten digits from the well-known MNIST dataset. This exercise synthesizes the concepts covered in this chapter, including data preparation, model definition using a framework, selecting a loss function and optimizer, the training loop, and performance evaluation.We will use PyTorch, a popular deep learning library, for this task.Setting Up the EnvironmentFirst, ensure you have PyTorch and Torchvision installed. If not, you can typically install them using pip or conda. Then, import the necessary modules:import torch import torch.nn as nn import torch.optim as optim import torchvision import torchvision.transforms as transforms from torch.utils.data import DataLoader # Check if GPU is available, otherwise use CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}")Loading and Preparing the MNIST DatasetThe MNIST dataset consists of 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels. 60,000 are for training and 10,000 for testing. torchvision provides convenient access to this dataset. We'll apply transformations to convert the images to PyTorch tensors and normalize their pixel values. Normalization helps stabilize training; we use the standard mean (0.1307) and standard deviation (0.3081) for MNIST.# Transformations to apply to the data transform = transforms.Compose([ transforms.ToTensor(), # Convert image to PyTorch Tensor transforms.Normalize((0.1307,), (0.3081,)) # Normalize pixel values ]) # Download and load the training data trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform) trainloader = DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2) # Download and load the test data testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform) testloader = DataLoader(testset, batch_size=1000, shuffle=False, num_workers=2)The DataLoader wraps the dataset, providing an iterator for easy batching, shuffling, and parallel data loading.Defining the Neural Network ModelWe'll define a simple feedforward neural network (Multi-Layer Perceptron) with two hidden layers using the ReLU activation function. The input layer takes flattened 28x28 images (784 features), and the output layer has 10 neurons (one for each digit class) with no activation applied here, as we will use CrossEntropyLoss which internally applies LogSoftmax.class SimpleMLP(nn.Module): def __init__(self): super(SimpleMLP, self).__init__() self.flatten = nn.Flatten() self.fc1 = nn.Linear(28*28, 128) # Input layer -> Hidden layer 1 self.relu1 = nn.ReLU() self.fc2 = nn.Linear(128, 64) # Hidden layer 1 -> Hidden layer 2 self.relu2 = nn.ReLU() self.fc3 = nn.Linear(64, 10) # Hidden layer 2 -> Output layer def forward(self, x): x = self.flatten(x) # Flatten the image x = self.fc1(x) x = self.relu1(x) x = self.fc2(x) x = self.relu2(x) x = self.fc3(x) # Raw scores (logits) return x # Instantiate the model and move it to the appropriate device (GPU or CPU) model = SimpleMLP().to(device) print(model)Compiling the Model: Loss Function and OptimizerFor multi-class classification, CrossEntropyLoss is a standard choice. It combines LogSoftmax and NLLLoss (Negative Log Likelihood Loss) in one class. We'll use the Adam optimizer, a popular and effective choice for many problems.# Define the loss function criterion = nn.CrossEntropyLoss() # Define the optimizer optimizer = optim.Adam(model.parameters(), lr=0.001) # Learning rate = 0.001Training the ModelThe training process involves iterating over the dataset multiple times (epochs). In each epoch, we iterate through the data in batches. For each batch:Move data and labels to the designated device (GPU/CPU).Zero the gradients accumulated from the previous batch (optimizer.zero_grad()).Perform a forward pass: Feed the input batch to the model to get predictions (outputs = model(inputs)).Calculate the loss between the predictions and the true labels (loss = criterion(outputs, labels)).Perform a backward pass: Compute the gradients of the loss with respect to model parameters (loss.backward()).Update the model parameters using the computed gradients (optimizer.step()).We'll train for a few epochs and print the loss periodically.num_epochs = 5 # Number of times to iterate over the training dataset training_losses = [] # To store loss values for plotting print("Starting Training...") for epoch in range(num_epochs): running_loss = 0.0 for i, data in enumerate(trainloader, 0): # Get the inputs; data is a list of [inputs, labels] inputs, labels = data[0].to(device), data[1].to(device) # Zero the parameter gradients optimizer.zero_grad() # Forward pass outputs = model(inputs) loss = criterion(outputs, labels) # Backward pass and optimize loss.backward() optimizer.step() # Print statistics running_loss += loss.item() if (i + 1) % 200 == 0: # Print every 200 mini-batches avg_loss = running_loss / 200 print(f'Epoch [{epoch + 1}/{num_epochs}], Batch [{i + 1}/{len(trainloader)}], Loss: {avg_loss:.4f}') training_losses.append({"epoch": epoch + (i+1)/len(trainloader), "loss": avg_loss}) running_loss = 0.0 print('Finished Training')Monitoring Training ProgressVisualizing the loss during training helps understand if the model is learning effectively. A decreasing loss generally indicates learning.{"layout": {"title": "Training Loss per 200 Batches", "xaxis": {"title": "Epoch"}, "yaxis": {"title": "Average Loss"}, "template": "plotly_white", "height": 350}, "data": [{"name": "Loss", "x": [0.21, 0.43, 0.64, 0.85, 1.21, 1.43, 1.64, 1.85, 2.21, 2.43, 2.64, 2.85, 3.21, 3.43, 3.64, 3.85, 4.21, 4.43, 4.64, 4.85], "y": [0.7652, 0.3541, 0.2877, 0.2419, 0.1753, 0.152, 0.1399, 0.1312, 0.102, 0.0961, 0.0844, 0.0958, 0.0725, 0.0734, 0.072, 0.0688, 0.0519, 0.0563, 0.061, 0.0581], "type": "scatter", "mode": "lines+markers", "marker": {"color": "#228be6"}, "line": {"color": "#228be6"}}]}Example training loss curve showing decrease over epochs. Actual values depend on the specific run.Evaluating Model PerformanceAfter training, we evaluate the model's performance on the unseen test dataset. We iterate through the test data, make predictions, and compare them to the true labels to calculate accuracy. It's important to disable gradient calculations (torch.no_grad()) during evaluation to save memory and computation, as we are not updating weights.correct = 0 total = 0 # Since we're not training, we don't need to calculate gradients with torch.no_grad(): model.eval() # Set model to evaluation mode for data in testloader: images, labels = data[0].to(device), data[1].to(device) # Calculate outputs by running images through the network outputs = model(images) # The class with the highest energy is what we choose as prediction _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() model.train() # Set model back to training mode (important if using dropout/batchnorm) accuracy = 100 * correct / total print(f'Accuracy of the network on the 10000 test images: {accuracy:.2f} %')A typical accuracy for this simple network on MNIST after 5 epochs might be around 96-97%. This demonstrates that our network has learned to recognize handwritten digits reasonably well.This hands-on example walked through the essential steps of building and training a basic neural network using PyTorch. You loaded data, defined a model architecture, chose loss and optimization strategies, executed the training loop, and evaluated the result. This forms the foundation for tackling more complex deep learning problems. In subsequent chapters, we will explore techniques to improve performance and handle more complex data types.