All Courses

Visualizing Training Progress and Model Performance

While final performance metrics give you a bottom-line score, they don't tell the whole story of how your model learned or where it might be struggling. Visualizing the training process and model performance over time provides invaluable insights, helping you diagnose issues, compare different model configurations, and ultimately build more effective deep learning models. Think of these visualizations as your dashboard, offering a live feed into the health and progress of your model's training.

Core Visualizations: Loss and Metric Curves

The most fundamental and informative visualizations in deep learning are plots of the loss function and primary evaluation metrics over training epochs or iterations. These plots typically show two curves: one for the training set and one for the validation set.

Loss Curves

Tracking your model's loss on both training and validation data is essential. The training loss indicates how well the model is fitting the data it sees, while the validation loss shows how well it generalizes to unseen data.

An ideal scenario involves both training and validation loss decreasing steadily and converging. A significant gap between the two, where training loss is much lower than validation loss, often signals overfitting. Conversely, if both remain high, it might indicate underfitting or issues with the learning process itself.

Training loss consistently decreases while validation loss decreases initially, then starts to rise, indicating the onset of overfitting around epoch 9-10.

Metric Curves

Alongside loss, you should visualize your primary evaluation metric (e.g., accuracy for classification, $MSE$ for regression). Similar to loss curves, you'll plot training and validation metrics. These plots provide a more direct understanding of the model's performance on the task it's designed for.

For instance, in a classification task, you'd watch the accuracy. Training accuracy might approach 100%, but if validation accuracy stagnates or drops, your model isn't generalizing well.

Training accuracy climbs steadily. Validation accuracy increases, peaks around epoch 9-10, and then slightly declines, suggesting overfitting and that early stopping could be beneficial.

Plotting Tools in Julia: Plots.jl

Julia's ecosystem offers excellent tools for visualization. The Plots.jl package is a popular choice, providing a unified interface to various plotting backends (like GR, PlotlyJS, PyPlot). This means you can write your plotting code once and choose different backends to render the visuals.

Typically, during your training loop, you'll collect loss and metric values at the end of each epoch (or more frequently for very large datasets). These values are stored in arrays, which can then be easily passed to Plots.jl functions.

# Assume these arrays are populated during your training loop
# using Plots
# theme(:default) # Optional: set a theme

# epochs = 1:15
# train_loss_history = [...] #populated from training
# val_loss_history = [...]   #populated from training
# train_acc_history = [...]  #populated from training
# val_acc_history = [...]    #populated from training

# # Plotting loss
# plot(epochs, train_loss_history, label="Training Loss", xlabel="Epoch", ylabel="Loss", lw=2)
# plot!(epochs, val_loss_history, label="Validation Loss", lw=2)
# title!("Model Loss During Training")
# savefig("loss_plot.png") # Save the plot

# # Plotting accuracy
# plot(epochs, train_acc_history, label="Training Accuracy", xlabel="Epoch", ylabel="Accuracy", legend=:bottomright, lw=2)
# plot!(epochs, val_acc_history, label="Validation Accuracy", lw=2)
# title!("Model Accuracy During Training")
# savefig("accuracy_plot.png")

This snippet demonstrates the basic workflow: gather data, then use plot and plot! (to add to an existing plot) for visualization. Plots.jl offers extensive customization for labels, titles, legends, line styles, and more.

Callbacks, as discussed previously, are an excellent mechanism for collecting these metrics systematically during training without cluttering your main training loop. You can design a callback to store these values and even update plots live or save them periodically.

Interpreting the Visual Tea Leaves

Visualizations are only useful if you can interpret them correctly. Here are common patterns and their implications:

Underfitting:
- Observation: Both training and validation loss are high and plateau early, or decrease very slowly. Training and validation metrics (e.g., accuracy) are low.
- Indication: The model is too simple to capture the patterns in the data.
- Possible Actions: Use a more complex model (more layers/neurons), train longer, engineer better features, or reduce regularization if it's too aggressive.
Overfitting:
- Observation: Training loss continues to decrease significantly, while validation loss starts to increase or stagnates at a higher value than training loss. A similar divergence is seen in metrics (e.g., training accuracy high, validation accuracy plateaus or drops).
- Indication: The model has learned the training data too well, including its noise, and fails to generalize to new, unseen data.
- Possible Actions: Add regularization (dropout, weight decay), get more training data, use data augmentation, implement early stopping (based on validation loss/metric), or simplify the model.
Good Fit:
- Observation: Both training and validation loss decrease and converge to a low value. The gap between training and validation loss is small. Metrics show similar convergence to a satisfactory performance level.
- Indication: The model has learned the underlying patterns and generalizes well.
- Possible Actions: This is the goal. You might still experiment with minor hyperparameter tweaks or different architectures to see if further improvement is possible.
Learning Rate Issues:
- Too High: Loss might oscillate wildly, or even explode (become NaN).
- Too Low: Loss decreases very slowly, and training takes an excessively long time. The model might get stuck in a suboptimal local minimum. Visualizing the loss curve can quickly reveal these issues, prompting adjustments to the learning rate or the learning rate schedule.

The following diagram provides a simplified decision guide based on observed loss curves:

Decision guide for interpreting loss curves and taking appropriate actions.

Practical Steps for Logging and Plotting

Integrating visualization into your workflow involves a few steps:

Initialize Collectors: Before starting your training loop, create empty arrays or appropriate data structures to store the metrics you want to track for each epoch (e.g., train_losses = Float64[], val_accuracies = Float64[]).
Log Metrics: Inside your training loop, after each epoch (or a set number of batches):
- Calculate training loss and any other relevant metrics on the training data processed so far in that epoch.
- Perform an evaluation pass on the entire validation set to get validation loss and metrics.
- Append these values to your collectors.
Plot Periodically or Post-Training:
- Live Plotting: For long training runs, you might want to update plots periodically (e.g., every few epochs). Plots.jl can be used to update figures in environments like Pluto.jl notebooks or IDEs with plotting panes.
- Post-Training Analysis: At a minimum, generate and save plots after training completes. This allows for comparison between different experiments.

Here's a sketch of how you might collect data within a simplified training function structure:

using Plots, Flux, Statistics # Assuming Flux for model and loss

# Dummy data and model for illustration
X_train, y_train = rand(Float32, 10, 100), Flux.onehotbatch(rand(0:1, 100), 0:1)
X_val, y_val = rand(Float32, 10, 50), Flux.onehotbatch(rand(0:1, 50), 0:1)
model = Chain(Dense(10, 5, relu), Dense(5, 2), softmax)
loss_fn(m, x, y) = Flux.logitcrossentropy(m(x), y)
opt = Adam(0.01)
ps = Flux.params(model)

function train_and_visualize!(model, loss_fn, opt, ps, X_train, y_train, X_val, y_val; epochs=20)
    train_loss_history = Float64[]
    val_loss_history = Float64[]
    val_acc_history = Float64[]

    println("Starting training...")
    for epoch in 1:epochs
        # Simplified training step
        Flux.train!(loss_fn, ps, [(X_train, y_train)], opt)
        
        # Calculate and store training loss
        current_train_loss = loss_fn(model, X_train, y_train)
        push!(train_loss_history, current_train_loss)

        # Calculate and store validation loss and accuracy
        current_val_loss = loss_fn(model, X_val, y_val)
        push!(val_loss_history, current_val_loss)
        
        # Calculate validation accuracy (example for binary classification)
        val_preds = Flux.onecold(model(X_val))
        val_true = Flux.onecold(y_val)
        current_val_acc = mean(val_preds .== val_true)
        push!(val_acc_history, current_val_acc)

        if epoch % 5 == 0 || epoch == epochs
            println("Epoch $epoch: Train Loss = $(round(current_train_loss, digits=4)), Val Loss = $(round(current_val_loss, digits=4)), Val Acc = $(round(current_val_acc, digits=4))")
        end
    end
    println("Training complete.")

    # Plotting
    p1 = plot(1:epochs, train_loss_history, label="Training Loss", color=:blue, lw=2)
    plot!(p1, 1:epochs, val_loss_history, label="Validation Loss", color=:orange, lw=2)
    xlabel!(p1, "Epoch")
    ylabel!(p1, "Loss")
    title!(p1, "Loss Curves")

    p2 = plot(1:epochs, val_acc_history, label="Validation Accuracy", color=:green, legend=:bottomright, lw=2)
    xlabel!(p2, "Epoch")
    ylabel!(p2, "Accuracy")
    title!(p2, "Validation Accuracy")
    
    # Display plots (behavior depends on your Julia environment)
    display(plot(p1, p2, layout=(1,2), size=(900,400))) 
    
    # Or save them
    # savefig(p1, "loss_curves.png")
    # savefig(p2, "accuracy_curve.png")

    return train_loss_history, val_loss_history, val_acc_history
end

# Example usage (will run the dummy training and plot)
# train_loss_hist, val_loss_hist, val_acc_hist = train_and_visualize!(
#     model, loss_fn, opt, ps, X_train, y_train, X_val, y_val, epochs=25
# );

In a real scenario, you'd use proper data loaders (like those from MLUtils.jl) for batching. This example focuses on the metric collection and plotting logic. Note the use of display(plot(p1, p2, ...)) to show multiple plots side-by-side.

Best Practices for Visualization

Always plot training and validation metrics together: This is the most direct way to identify overfitting.
Log consistently: Use the same metrics and logging frequency across experiments for fair comparisons.
Label axes and title plots clearly: Make your plots self-explanatory.
Choose appropriate scales: Logarithmic scales can be useful for loss if it changes by orders of magnitude.
Save your plots: Keep a record of your experiments. Naming plots systematically (e.g., including hyperparameters in the filename) can be very helpful.
Consider smoothing: If curves are very noisy (especially when plotting per-batch metrics), applying a moving average can reveal underlying trends. However, be cautious as this can also obscure important short-term dynamics.

By consistently visualizing your model's training, you move from a "black box" approach to an informed, iterative process of model development and refinement. These visual tools are indispensable for understanding how your choices in architecture, optimization, and regularization impact learning.

Was this section helpful?