requires_grad
)backward()
).grad
)torch.nn
torch.nn.Module
Base Classtorch.nn
losses)torch.optim
)torch.utils.data.Dataset
torchvision.transforms
)torch.utils.data.DataLoader
Training a neural network is an iterative optimization process. You provide the model with data, measure how inaccurate its predictions are, and then adjust its internal parameters (weights and biases) slightly to reduce that inaccuracy. This cycle repeats many times. The code structure that manages this repetitive process is commonly referred to as the training loop.
At a high level, training usually involves two nested loops:
DataLoader
you learned about previously is responsible for providing these batches. Training in batches is memory-efficient and can also lead to more stable convergence and better generalization compared to processing samples one by one or using the entire dataset at once.For every batch processed within an epoch, the training loop executes a sequence of well-defined steps. Let's break down what happens in a typical iteration:
DataLoader
. It's also important at this stage to ensure the data is transferred to the correct computational device (CPU or GPU) where your model parameters reside.zero_grad()
method on your optimizer object.
optimizer.zero_grad()
```
3. Forward Pass: Feed the batch of input features into your model. The model processes the data through its layers, applying learned weights and activation functions, ultimately producing a batch of predictions or outputs. ```python
predictions = model(input_batch)
```
4. Calculate Loss: Compare the model's predictions
against the true target_batch
using your chosen loss function (criterion), such as nn.CrossEntropyLoss
for classification or nn.MSELoss
for regression. The loss function returns a single scalar value representing the average error or discrepancy for the current batch. This value indicates how well (or poorly) the model performed on this specific batch.
```python
loss = criterion(predictions, target_batch)
```
5. Backpropagation: This is where PyTorch's automatic differentiation engine, Autograd, calculates the gradients. Calling $loss.backward()
computes the gradient of the loss scalar with respect to every model parameter that has requires_grad=True
(which is the default for parameters within nn.Module
). These gradients represent the sensitivity of the loss to changes in each parameter; essentially, they tell the optimizer how to adjust each weight to decrease the loss.
```python
loss.backward()
```
6. Update Weights (Optimizer Step): With the gradients computed, the optimizer can now adjust the model's parameters. Calling $optimizer.step()
updates each parameter based on its computed gradient and the optimizer's specific algorithm (like SGD with momentum, Adam, etc.). The goal is to take a small step in the direction that minimizes the loss.
```python
optimizer.step()
```
These six steps form the core of one iteration within the training loop. This cycle is repeated for every batch provided by the DataLoader
. Once all batches have been processed, one epoch is complete, and the outer loop begins the next epoch, repeating the entire batch iteration process.
Flow diagram illustrating the sequence of operations within a single batch iteration of the PyTorch training loop.
© 2025 ApX Machine Learning