Now that you have a solid grasp of how to construct training loops, evaluate models, and apply techniques like regularization, it's time to put all these pieces together. This hands-on practical session will guide you through training a neural network, evaluating its performance, and then fine-tuning it to achieve better results. We'll simulate a common workflow, emphasizing the iterative nature of model development.
For this exercise, we'll tackle a binary classification problem using a synthetic dataset. This allows us to focus on the training and tuning mechanics without getting bogged down in complex data loading or preprocessing.
First, let's set up our environment and generate some data. We'll need Flux
, MLUtils
for data handling, Plots
(or your preferred plotting library) for visualization, and Random
for reproducibility.
using Flux
using MLUtils: DataLoader, unsqueeze
using Random
using Printf
using Statistics: mean
# For reproducibility
Random.seed!(123)
# Generate a synthetic dataset for binary classification
function generate_data(n_samples=200)
# Class 0: centered around (-1, -1)
X1 = randn(Float32, 2, n_samples ÷ 2) .- 1.0f0
Y1 = zeros(Int, n_samples ÷ 2)
# Class 1: centered around (1, 1)
X2 = randn(Float32, 2, n_samples ÷ 2) .+ 1.0f0
Y2 = ones(Int, n_samples ÷ 2)
X = hcat(X1, X2)
Y = vcat(Y1, Y2)
# Shuffle the data
indices = shuffle(1:n_samples)
X = X[:, indices]
Y = Y[indices]
# Reshape Y for Flux's binary cross-entropy (expects 1xN)
return X, unsqueeze(Float32.(Y), 1)
end
X_train, Y_train = generate_data(400)
X_test, Y_test = generate_data(100)
# Create DataLoaders
batch_size = 32
train_loader = DataLoader((X_train, Y_train), batchsize=batch_size, shuffle=true)
test_loader = DataLoader((X_test, Y_test), batchsize=batch_size)
Now, let's define a simple Multilayer Perceptron (MLP) for our classification task.
input_dim = 2
hidden_dim = 10
output_dim = 1 # Single output for binary classification with sigmoid
model_v1 = Chain(
Dense(input_dim, hidden_dim, relu),
Dense(hidden_dim, output_dim) # Sigmoid will be applied via logitbinarycrossentropy
)
# Loss function and optimizer
loss_fn(x, y) = Flux.logitbinarycrossentropy(model_v1(x), y)
optimizer = Adam(0.01) # Initial learning rate
# Parameters to train
params = Flux.params(model_v1)
Here, logitbinarycrossentropy
is suitable as it applies a sigmoid internally and is numerically more stable than binarycrossentropy
with a separate sigmoid
layer.
Let's implement a basic training loop and train our model_v1
.
function accuracy(model, data_loader)
correct = 0
total = 0
for (x, y) in data_loader
# Apply sigmoid to model output to get probabilities
y_hat_prob = sigmoid.(model(x))
# Convert probabilities to binary predictions (0 or 1)
y_hat = ifelse.(y_hat_prob .> 0.5f0, 1.0f0, 0.0f0)
correct += sum(y_hat .== y)
total += length(y)
end
return correct / total
end
epochs = 20
history_v1 = Dict("loss" => Float64[], "accuracy" => Float64[])
println("Training model_v1...")
for epoch in 1:epochs
epoch_loss = 0.0
for (x_batch, y_batch) in train_loader
# Calculate loss and gradients
batch_loss, grads = Flux.withgradient(params) do
loss_fn(x_batch, y_batch)
end
# Update parameters
Flux.update!(optimizer, params, grads)
epoch_loss += batch_loss * size(x_batch, 2) # Weighted by batch size
end
avg_epoch_loss = epoch_loss / size(X_train, 2)
# Calculate accuracy on the training set for monitoring
train_acc = accuracy(model_v1, train_loader)
push!(history_v1["loss"], avg_epoch_loss)
push!(history_v1["accuracy"], train_acc)
@printf "Epoch %2d: Loss = %.4f, Train Accuracy = %.2f%%\n" epoch avg_epoch_loss (train_acc * 100)
end
# Evaluate on test set
test_acc_v1 = accuracy(model_v1, test_loader)
println("Final Test Accuracy (model_v1): $(test_acc_v1 * 100)%")
After running this, you'll likely see a decent accuracy, but perhaps there's room for improvement. Let's assume our model_v1
achieved around 90-95% test accuracy. The training loss should decrease, and accuracy should increase over epochs.
One common issue is overfitting, where the model performs well on training data but poorly on unseen test data. Regularization techniques help combat this. Let's add Dropout
to our model.
model_v2 = Chain(
Dense(input_dim, hidden_dim, relu),
Dropout(0.3), # Add dropout after the first hidden layer
Dense(hidden_dim, output_dim)
)
loss_fn_v2(x, y) = Flux.logitbinarycrossentropy(model_v2(x), y)
optimizer_v2 = Adam(0.01) # Reset optimizer or use a new one
params_v2 = Flux.params(model_v2)
history_v2 = Dict("loss" => Float64[], "accuracy" => Float64[])
println("\nTraining model_v2 with Dropout...")
for epoch in 1:epochs # Same number of epochs for comparison
epoch_loss = 0.0
# Important: Set model to training mode for Dropout
Flux.trainmode!(model_v2)
for (x_batch, y_batch) in train_loader
batch_loss, grads = Flux.withgradient(params_v2) do
loss_fn_v2(x_batch, y_batch)
end
Flux.update!(optimizer_v2, params_v2, grads)
epoch_loss += batch_loss * size(x_batch, 2)
end
avg_epoch_loss = epoch_loss / size(X_train, 2)
# Important: Set model to test mode for evaluation
Flux.testmode!(model_v2)
train_acc = accuracy(model_v2, train_loader)
push!(history_v2["loss"], avg_epoch_loss)
push!(history_v2["accuracy"], train_acc)
@printf "Epoch %2d: Loss = %.4f, Train Accuracy = %.2f%%\n" epoch avg_epoch_loss (train_acc * 100)
end
Flux.testmode!(model_v2) # Ensure test mode for final evaluation
test_acc_v2 = accuracy(model_v2, test_loader)
println("Final Test Accuracy (model_v2 with Dropout): $(test_acc_v2 * 100)%")
When using layers like Dropout
or BatchNorm
, it's important to switch the model between training mode (Flux.trainmode!
) and test mode (Flux.testmode!
). Dropout is only active during training. For our simple dataset, dropout might not show a dramatic improvement or could even slightly degrade performance if the model wasn't overfitting much to begin with. However, on more complex datasets, it's a valuable tool.
The learning rate is one of the most significant hyperparameters. A rate that's too high can cause the optimizer to overshoot the minimum, while one that's too low can lead to very slow convergence or getting stuck in suboptimal local minima.
Let's try a different learning rate with our model_v1
(the non-dropout version for a clearer comparison of just the learning rate effect).
# Re-initialize model_v1 or create a new instance if you want to keep the old one
model_v3 = Chain(
Dense(input_dim, hidden_dim, relu),
Dense(hidden_dim, output_dim)
)
loss_fn_v3(x, y) = Flux.logitbinarycrossentropy(model_v3(x), y)
# Try a smaller learning rate
optimizer_v3 = Adam(0.001)
params_v3 = Flux.params(model_v3)
history_v3 = Dict("loss" => Float64[], "accuracy" => Float64[])
println("\nTraining model_v3 with learning rate 0.001...")
for epoch in 1:epochs # Use the same number of epochs
epoch_loss = 0.0
for (x_batch, y_batch) in train_loader
batch_loss, grads = Flux.withgradient(params_v3) do
loss_fn_v3(x_batch, y_batch)
end
Flux.update!(optimizer_v3, params_v3, grads)
epoch_loss += batch_loss * size(x_batch, 2)
end
avg_epoch_loss = epoch_loss / size(X_train, 2)
train_acc = accuracy(model_v3, train_loader)
push!(history_v3["loss"], avg_epoch_loss)
push!(history_v3["accuracy"], train_acc)
@printf "Epoch %2d: Loss = %.4f, Train Accuracy = %.2f%%\n" epoch avg_epoch_loss (train_acc * 100)
end
test_acc_v3 = accuracy(model_v3, test_loader)
println("Final Test Accuracy (model_v3 with LR 0.001): $(test_acc_v3 * 100)%")
Compare test_acc_v3
with test_acc_v1
. Did the smaller learning rate help, hinder, or make little difference? Sometimes a smaller learning rate requires more epochs to converge. This process of trying different values is the essence of hyperparameter tuning. More systematic approaches include grid search, random search, or Bayesian optimization, which are beyond this initial hands-on but build upon this trial-and-error foundation.
Callbacks can simplify your training loop and add powerful functionality, like logging metrics, saving models, or implementing early stopping. Flux doesn't have a direct built-in callback system as extensive as some Python frameworks, but you can easily implement similar logic. For instance, Flux.Optimise.run!
accepts a callback.
Let's demonstrate a simple custom logging callback within our manual loop. For more complex scenarios, you might use Flux.Optimise.run!
or libraries that extend Flux with callback functionalities.
Here's how you might integrate a simple logging action:
# ... (model, loss, optimizer, params defined as before) ...
# Example: Training model_v1 again, but with an explicit callback-like action
println("\nTraining model_v1 with a simple callback-like action...")
for epoch in 1:epochs
# Callback action at the start of an epoch
# println("Starting epoch $epoch...")
Flux.train!(loss_fn, params, train_loader, optimizer) # Using Flux.train! for brevity
# Callback action at the end of an epoch
current_loss = mean(loss_fn(x, y) for (x,y) in train_loader) # Approximate
current_acc = accuracy(model_v1, train_loader)
@printf "Epoch %2d: Loss = %.4f, Train Accuracy = %.2f%%\n" epoch current_loss (current_acc * 100)
# Example: Early stopping (very basic)
# if current_loss < 0.05 break end
end
Flux.train!
simplifies the batch iteration part of the training loop. More sophisticated callbacks, like those for saving the best model or adjusting the learning rate dynamically (learning rate scheduling), can be integrated into this loop structure.
Visualizing metrics like loss and accuracy over epochs is invaluable for understanding model behavior. Let's plot the training loss for one of our models using a placeholder for a plotting library. If you have Plots.jl
and a backend (like GR
or PlotlyJS
) installed, you can adapt this.
Training loss curve for a run of
model_v1
. A decreasing trend indicates learning.
To generate this plot with your actual history_v1["loss"]
data using Plots.jl
:
# Assuming Plots.jl is installed and you have a backend
# using Plots
# plotly() # Or your preferred backend
# plot(1:epochs, history_v1["loss"], label="Model v1 Training Loss",
# xlabel="Epoch", ylabel="Loss", title="Training Loss Over Epochs",
# linewidth=2, marker=:circle)
# You could similarly plot accuracy:
# plot!(1:epochs, history_v1["accuracy"], label="Model v1 Training Accuracy", seriestype=:line)
Observing these plots helps diagnose issues. A flat loss curve might indicate a learning rate that's too small or a problem with gradients. A loss that increases could mean the learning rate is too high. If training accuracy is high but test accuracy is low, it signals overfitting.
"Fine-tuning" broadly refers to the process of making adjustments to a model or training process to improve performance. This can range from:
RMSProp
instead of Adam
).The steps we took by adding dropout and changing the learning rate are forms of fine-tuning. Each change should ideally be evaluated systematically. Typically, you'd change one thing at a time to understand its impact.
In this session, you've:
Dropout
as a regularization technique and observed its effect, remembering trainmode!
and testmode!
.This iterative cycle of training, evaluating, and refining is central to applied deep learning. Each dataset and problem will present unique challenges, but the foundational techniques for training and fine-tuning explored here provide a strong starting point for building effective models with Julia and Flux.jl. Remember that patience and systematic experimentation are your best allies in this process.
Was this section helpful?
© 2025 ApX Machine Learning