The individual components of training are integrated into a complete, working example. This example guides you through setting up a model, preparing data, and implementing both the training and evaluation loops. It also touches upon saving the trained model state.For this exercise, we'll tackle a simple linear regression problem using synthetic data. Our goal is to train a model to learn the relationship $y \approx 2x + 1$.1. Setup: Imports and HyperparametersFirst, let's import the necessary PyTorch modules and define some basic hyperparameters for our training process.import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import TensorDataset, DataLoader # Hyperparameters learning_rate = 0.01 num_epochs = 100 batch_size = 16 # Device configuration (use GPU if available) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Using device: {device}")2. Data PreparationWe'll generate synthetic data for a linear relationship and wrap it using TensorDataset and DataLoader.# Generate synthetic data: y = 2x + 1 + noise true_weight = torch.tensor([[2.0]]) true_bias = torch.tensor([1.0]) # Generate training data X_train_tensor = torch.randn(100, 1) * 5 # 100 examples, 1 feature y_train_tensor = true_weight * X_train_tensor + true_bias + torch.randn(100, 1) * 0.5 # Add some noise # Generate validation data (separate set) X_val_tensor = torch.randn(20, 1) * 5 # 20 examples, 1 feature y_val_tensor = true_weight * X_val_tensor + true_bias + torch.randn(20, 1) * 0.5 # Create datasets train_dataset = TensorDataset(X_train_tensor, y_train_tensor) val_dataset = TensorDataset(X_val_tensor, y_val_tensor) # Create dataloaders train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False) # No need to shuffle validation dataHere, TensorDataset conveniently wraps our input features (X) and target labels (y) tensors. DataLoader then takes this dataset and provides iterable batches, handling shuffling and batching automatically.3. Model, Loss, and OptimizerNow, define the model architecture, the loss function, and the optimizer. Since we are modeling a linear relationship $y=wx+b$, a single linear layer is sufficient.# Define the model (a simple linear layer) # Input feature size = 1, Output feature size = 1 model = nn.Linear(1, 1).to(device) # Move model to the selected device # Define the loss function (Mean Squared Error for regression) loss_fn = nn.MSELoss() # Define the optimizer (Stochastic Gradient Descent) optimizer = optim.SGD(model.parameters(), lr=learning_rate) print("Model definition:") print(model) print("\nInitial parameters:") for name, param in model.named_parameters(): if param.requires_grad: print(f"{name}: {param.data.squeeze()}")We instantiate nn.Linear which represents the operation $y = Wx + b$. PyTorch automatically initializes the weight ($W$) and bias ($b$) parameters. We use Mean Squared Error (nn.MSELoss) as it's standard for regression tasks, measuring the average squared difference between predictions and true values. Stochastic Gradient Descent (optim.SGD) is chosen to update the model's parameters based on the computed gradients. Notice we pass model.parameters() to the optimizer so it knows which tensors to update. Finally, we move the model to the configured device (CPU or GPU).4. The Training LoopThis is the core of the process where the model learns from the data iteratively.print("\nStarting Training...") for epoch in range(num_epochs): model.train() # Set the model to training mode running_loss = 0.0 num_batches = 0 # Iterate over batches from the DataLoader for i, (features, labels) in enumerate(train_loader): # Move batch data to the same device as the model features = features.to(device) labels = labels.to(device) # 1. Forward pass: Compute model's predictions outputs = model(features) # 2. Calculate the loss loss = loss_fn(outputs, labels) # 3. Backward pass: Compute gradients # First, zero the gradients from the previous step optimizer.zero_grad() # Then, perform backpropagation loss.backward() # 4. Optimizer step: Update model weights optimizer.step() # Accumulate loss for reporting running_loss += loss.item() num_batches += 1 # Print average loss for the epoch avg_epoch_loss = running_loss / num_batches if (epoch + 1) % 10 == 0: # Print every 10 epochs print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_epoch_loss:.4f}") print("Training Finished!")Let's break down the steps inside the epoch loop:model.train(): Sets the model to training mode. This is important for layers like Dropout or BatchNorm which behave differently during training and evaluation.We iterate through train_loader to get batches of features and labels.Data is moved to the device where the model resides. This prevents runtime errors.Forward Pass: outputs = model(features) calculates the model's predictions for the input batch.Loss Calculation: loss = loss_fn(outputs, labels) computes the difference between predictions and actual labels using the MSE criterion.Backward Pass:optimizer.zero_grad(): Clears old gradients. If you forget this, gradients will accumulate from previous iterations, leading to incorrect updates.loss.backward(): Computes the gradient of the loss with respect to all model parameters that have requires_grad=True.Optimizer Step: optimizer.step() updates the model's parameters (model.parameters()) using the gradients computed in the backward pass and the optimization algorithm (SGD in this case).We track the running_loss to report the average loss for the epoch.5. The Evaluation LoopAfter training (or periodically during training, e.g., after each epoch), we need to evaluate the model's performance on unseen data (the validation set) without updating its weights.print("\nStarting Evaluation...") model.eval() # Set the model to evaluation mode total_val_loss = 0.0 num_val_batches = 0 # Disable gradient calculations for evaluation with torch.no_grad(): for features, labels in val_loader: # Move batch data to the device features = features.to(device) labels = labels.to(device) # Forward pass outputs = model(features) # Calculate loss loss = loss_fn(outputs, labels) total_val_loss += loss.item() num_val_batches += 1 avg_val_loss = total_val_loss / num_val_batches print(f"Validation Loss: {avg_val_loss:.4f}") # Inspect the learned parameters print("\nLearned parameters:") for name, param in model.named_parameters(): if param.requires_grad: print(f"{name}: {param.data.squeeze()}") print(f"(True weight: {true_weight.item():.4f}, True bias: {true_bias.item():.4f})") Main differences in the evaluation loop:model.eval(): Sets the model to evaluation mode.with torch.no_grad():: This context manager disables gradient calculation within the block. This is important because we don't need gradients for evaluation, and it reduces memory consumption and speeds up computation.We do not call loss.backward() or optimizer.step() because we are only measuring performance, not training.We accumulate the loss across all validation batches to get an average validation loss.After evaluation, we print the learned parameters. Compare them to the true_weight (2.0) and true_bias (1.0) we used to generate the data. They should be reasonably close after 100 epochs.6. Saving and Loading Model StatePersisting your trained model is essential. The standard practice is to save the model's state_dict, which contains all its learned parameters (weights and biases).# Saving the model's learned parameters model_save_path = 'linear_regression_model.pth' torch.save(model.state_dict(), model_save_path) print(f"\nModel state_dict saved to {model_save_path}") # Example of loading the model state # First, instantiate the model architecture again loaded_model = nn.Linear(1, 1).to(device) # Then, load the saved state dictionary loaded_model.load_state_dict(torch.load(model_save_path)) print("Model state_dict loaded successfully.") # Remember to set the loaded model to evaluation mode if using for inference loaded_model.eval() # You can now use loaded_model for predictions # Example prediction with the loaded model: with torch.no_grad(): sample_input = torch.tensor([[10.0]]).to(device) # Example input prediction = loaded_model(sample_input) print(f"Prediction for input 10.0: {prediction.item():.4f}") # Expected output should be close to 2*10 + 1 = 21Saving the state_dict is generally preferred over saving the entire model object because it's more flexible and less prone to breaking if the underlying code changes. To load the state, you need to create an instance of the same model architecture first and then load the dictionary into it.Complete Runnable ExampleHere is the complete script combining all the parts:import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import TensorDataset, DataLoader # 1. Setup: Hyperparameters and Device learning_rate = 0.01 num_epochs = 100 batch_size = 16 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Using device: {device}") # 2. Data Preparation true_weight = torch.tensor([[2.0]]) true_bias = torch.tensor([1.0]) X_train_tensor = torch.randn(100, 1, device=device) * 5 # Generate data directly on device y_train_tensor = true_weight.to(device) * X_train_tensor + true_bias.to(device) + torch.randn(100, 1, device=device) * 0.5 X_val_tensor = torch.randn(20, 1, device=device) * 5 y_val_tensor = true_weight.to(device) * X_val_tensor + true_bias.to(device) + torch.randn(20, 1, device=device) * 0.5 train_dataset = TensorDataset(X_train_tensor, y_train_tensor) val_dataset = TensorDataset(X_val_tensor, y_val_tensor) train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False) # 3. Model, Loss, and Optimizer model = nn.Linear(1, 1).to(device) loss_fn = nn.MSELoss() optimizer = optim.SGD(model.parameters(), lr=learning_rate) print("Model definition:") print(model) print("\nInitial parameters:") for name, param in model.named_parameters(): if param.requires_grad: print(f"{name}: {param.data.squeeze()}") # 4. Training Loop print("\nStarting Training...") for epoch in range(num_epochs): model.train() running_loss = 0.0 num_batches = 0 for i, (features, labels) in enumerate(train_loader): # Data is already on the correct device outputs = model(features) loss = loss_fn(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() num_batches += 1 avg_epoch_loss = running_loss / num_batches if (epoch + 1) % 10 == 0: print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_epoch_loss:.4f}") print("Training Finished!") # 5. Evaluation Loop print("\nStarting Evaluation...") model.eval() total_val_loss = 0.0 num_val_batches = 0 with torch.no_grad(): for features, labels in val_loader: # Data is already on the correct device outputs = model(features) loss = loss_fn(outputs, labels) total_val_loss += loss.item() num_val_batches += 1 avg_val_loss = total_val_loss / num_val_batches print(f"Validation Loss: {avg_val_loss:.4f}") print("\nLearned parameters:") for name, param in model.named_parameters(): if param.requires_grad: print(f"{name}: {param.data.squeeze().item():.4f}") # Use .item() for single values print(f"(True weight: {true_weight.item():.4f}, True bias: {true_bias.item():.4f})") # 6. Saving and Loading Model State model_save_path = 'linear_regression_model.pth' torch.save(model.state_dict(), model_save_path) print(f"\nModel state_dict saved to {model_save_path}") loaded_model = nn.Linear(1, 1).to(device) loaded_model.load_state_dict(torch.load(model_save_path)) loaded_model.eval() print("Model state_dict loaded successfully.") with torch.no_grad(): sample_input = torch.tensor([[10.0]]).to(device) prediction = loaded_model(sample_input) print(f"Prediction for input 10.0: {prediction.item():.4f}") (Note: In the combined script, data generation was slightly modified to create tensors directly on the target device for efficiency, removing the need for .to(device) inside the loops for batch data.)This hands-on example demonstrates the fundamental structure for training virtually any model in PyTorch. You now have a template combining data loading, model definition, training iteration, evaluation, and persistence. You can adapt this structure for more complex models and datasets by changing the model architecture in step 3 and the data preparation in step 2. The core logic of the training and evaluation loops remains remarkably consistent.