All Courses

Hands-on Practical: A Simple Regressor with Flux

Now that you're familiar with the fundamental components of Flux.jl, it's time to put them into action. This practical exercise will guide you through building a simple regression model. Regression tasks involve predicting a continuous numerical value. We'll generate some synthetic data, define a model, choose a loss function and an optimizer, and then train the model to fit the data. This example will solidify your understanding of how these pieces work together in a typical Flux.jl workflow.

Our goal is to train a model that can learn a simple linear relationship of the form $y = mx + b$ . We'll generate data points based on a known linear function and add some noise to make the learning task a bit more interesting.

Generating Synthetic Data for Regression

First, let's create some data. We'll generate a set of input values X and corresponding target values Y. Flux.jl generally expects input data where each column is a sample and each row is a feature. For our simple one-dimensional input, X will be a $1 \times N$ matrix, and Y will also be a $1 \times N$ matrix.

using Flux, Random

# Set a seed for reproducibility
Random.seed!(123)

# Generate input features X (e.g., 100 data points from 0 to 5)
# We need X as a row vector (1xN matrix) for Flux
X_data = hcat(collect(0.0:0.05:5.0)...) # Creates a 1x101 matrix

# Define the true underlying relationship (e.g., y = 2x + 1)
# and add some noise
true_slope = 2.0
true_intercept = 1.0
Y_data = (true_slope .* X_data) .+ true_intercept .+ (randn(Float32, size(X_data)) .* 0.5f0)

# Flux prefers Float32 for performance, especially with GPUs,
# though Float64 works too. Let's ensure our data is Float32.
X_data = Float32.(X_data)
Y_data = Float32.(Y_data)

println("Size of X_data: ", size(X_data))
println("Size of Y_data: ", size(Y_data))

In this snippet, X_data represents our input feature, and Y_data is the target variable we want to predict. We've added some random noise to Y_data to simulate a more common scenario where relationships aren't perfectly deterministic.

Defining the Regression Model

For a simple linear regression, a single Dense layer is sufficient. This layer performs a linear transformation: $output = W \cdot input + b$ . Since our input X_data has one feature (its value) and we want to predict a single output value Y_data, the Dense layer will map from 1 input feature to 1 output feature.

# Define a simple linear model: one input feature, one output feature
model = Dense(1 => 1)

# We can inspect the initial (randomly initialized) parameters
println("Initial weights: ", model.weight)
println("Initial bias: ", model.bias)

The Dense(1 => 1) layer creates a connection where 1 is the dimension of the input and the second 1 is the dimension of the output. Flux initializes the weights and bias with small random values. Our training process will adjust these to fit our data.

Choosing a Loss Function

To train our model, we need a way to measure how "wrong" its predictions are. For regression tasks, the Mean Squared Error (MSE) is a common choice. MSE calculates the average of the squared differences between the predicted values and the actual values.

\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_{\text{predicted}}^{(i)} - y_{\text{actual}}^{(i)})^2

Flux provides this in Flux.Losses.mse.

# Define the loss function: Mean Squared Error
loss(x, y) = Flux.Losses.mse(model(x), y)

# Let's test the loss with our initial random model
initial_loss = loss(X_data, Y_data)
println("Initial loss: ", initial_loss)

The loss function takes an input x and target y, passes x through the model to get predictions, and then calculates the MSE between these predictions and y.

Selecting an Optimizer

The optimizer is responsible for updating the model's parameters (weights and bias) based on the gradients of the loss function. We'll use the Descent optimizer, which implements standard gradient descent. We need to provide a learning rate, which controls how much the parameters are adjusted in each step.

# Define the optimizer: Gradient Descent with a learning rate of 0.01
opt = Descent(0.01)

# Get the parameters of the model that Flux will train
params = Flux.params(model)
println("Parameters to be trained: ", params)

Flux.params(model) collects all trainable parameters from our model. The Descent optimizer will use the gradients calculated with respect to these parameters to update them.

The Training Loop

Now we'll implement the training loop. In each iteration (or epoch) of the loop, we will:

Calculate the gradients of the loss function with respect to the model params. This is where automatic differentiation, powered by Zygote.jl, comes into play.
Update the model's params using the optimizer and the calculated gradients.

We'll run this for a fixed number of epochs and print the loss periodically to see if the model is learning.

# Training parameters
epochs = 200

# The training loop
println("Starting training...")
for epoch in 1:epochs
    # Calculate gradients
    grads = gradient(() -> loss(X_data, Y_data), params)
    
    # Update model parameters
    Flux.update!(opt, params, grads)
    
    # Print progress (e.g., every 20 epochs)
    if epoch % 20 == 0
        current_loss = loss(X_data, Y_data)
        println("Epoch: $epoch, Loss: $current_loss")
    end
end
println("Training finished.")

# Let's see the learned parameters
println("Learned weights: ", model.weight)
println("Learned bias: ", model.bias)

Inside gradient(() -> loss(X_data, Y_data), params), Zygote.jl computes the derivative of the loss function with respect to each parameter in params. Flux.update!(opt, params, grads) then applies the optimization step (e.g., $parameter = parameter - \text{learning\_rate} \times gradient$ ).

You should observe the loss decreasing over epochs, indicating that the model is getting better at predicting Y_data from X_data. The learned weights and bias should be close to our true_slope (2.0) and true_intercept (1.0), respectively.

Making Predictions and Visualizing Results

After training, we can use our model to make predictions on the input data and compare them to the actual target values. A good way to visualize the performance of a simple regressor is to plot the original data points along with the line learned by the model.

# Make predictions with the trained model
Y_predicted = model(X_data)

# For plotting, we'll need a plotting package.
# If you don't have Plots.jl and a backend like GR,
# you can install them:
# import Pkg; Pkg.add(["Plots", "GR"])
# For this example, we'll provide the Plotly JSON directly.

# Convert data for plotting if needed (e.g., to regular arrays from Flux's TrackedArrays)
# X_plot = X_data[1,:] # Get the first (and only) row as a vector
# Y_plot = Y_data[1,:] # Get the first (and only) row as a vector
# Y_pred_plot = Y_predicted[1,:] # Get the first (and only) row as a vector

# If you were to plot using Plots.jl:
# using Plots
# scatter(X_plot, Y_plot, label="Data Points", mc=:blue)
# plot!(X_plot, Y_pred_plot, label="Learned Regression Line", lc=:red, lw=2)
# xlabel!("Feature (X)")
# ylabel!("Target (Y)")
# title!("Simple Linear Regression with Flux.jl")

Below is a representation of what such a plot might look like. The blue dots represent our noisy data points, and the red line shows the linear relationship learned by our Flux model.

The scatter plot shows the original data points, while the continuous line represents the predictions made by the trained linear regression model. Ideally, this line should pass through the "center" of the data points, capturing the underlying linear trend. The y-values for the "Learned Regression Line" are illustrative and would be derived from model(X_data) after training. The specific y-values for the data points are also illustrative examples of noisy data around a line $y \approx 2x+1$ .

Complete Code Example

Here is the full script for our simple regressor:

using Flux, Random

# Set a seed for reproducibility
Random.seed!(123)

# 1. Generate Synthetic Data
X_data = hcat(collect(0.0f0:0.05f0:5.0f0)...) # 1xN matrix
true_slope = 2.0f0
true_intercept = 1.0f0
Y_data = (true_slope .* X_data) .+ true_intercept .+ (randn(Float32, size(X_data)) .* 0.5f0)

println("Size of X_data: ", size(X_data))
println("Size of Y_data: ", size(Y_data))

# 2. Define the Model
model = Dense(1 => 1) # One input feature, one output feature
println("Initial weights: W=", model.weight, ", b=", model.bias)

# 3. Define Loss Function
loss(x, y) = Flux.Losses.mse(model(x), y)
println("Initial loss: ", loss(X_data, Y_data))

# 4. Select Optimizer
opt = Descent(0.01) # Gradient Descent with learning rate 0.01
params = Flux.params(model)

# 5. The Training Loop
epochs = 200
println("\nStarting training for $epochs epochs...")
for epoch in 1:epochs
    grads = gradient(() -> loss(X_data, Y_data), params)
    Flux.update!(opt, params, grads)
    
    if epoch % 20 == 0 || epoch == 1
        current_loss = loss(X_data, Y_data)
        println("Epoch: $epoch, Loss: $current_loss")
    end
end
println("Training finished.\n")

# Display learned parameters
println("Learned weights: W=", model.weight, ", b=", model.bias)
println("True parameters: W_true=", [true_slope], ", b_true=", [true_intercept])

# 6. Make Predictions (Optional: Show a few predictions)
# For example, predict for the first 5 data points
X_sample = X_data[:, 1:5]
Y_sample_actual = Y_data[:, 1:5]
Y_sample_predicted = model(X_sample)

println("\nSample Predictions:")
for i in 1:size(X_sample, 2)
    println("Input: $(round(X_sample[1,i], digits=2)), Actual: $(round(Y_sample_actual[1,i], digits=2)), Predicted: $(round(Y_sample_predicted[1,i], digits=2))")
end

Running this code will show the initial state of the model, the loss decreasing during training, and the final learned parameters, which should approximate the true slope and intercept we used to generate the data.

This hands-on exercise demonstrated the end-to-end process of building, training, and making predictions with a very basic neural network for regression using Flux.jl. You've seen how to define layers, compose them into a model, specify a loss function, choose an optimizer, and implement a training loop. These are the core skills you'll build upon as we explore more complex architectures and tasks.

Was this section helpful?