All Courses

Constructing a Basic Neural Network in Flux

We've explored the individual building blocks of Flux.jl: layers like Dense, the Chain constructor for sequencing operations, activation functions for non-linearity, loss functions to quantify error, and optimizers to guide learning. Now, we'll integrate these components to construct a complete, albeit basic, neural network. This process forms the foundation for building more complex deep learning models.

Defining the Network Architecture

The most straightforward way to define a feedforward neural network in Flux is using Chain. A Chain takes a sequence of layers, or any callable functions, and applies them one after another to the input data.

Consider a simple regression task: predicting a single continuous value from, say, three input features. We might design a network with one hidden layer. This network would consist of:

An input layer, implicitly defined by the dimensions of the first Dense layer.
A Dense hidden layer with a chosen number of neurons (e.g., five) and an activation function (e.g., relu).
A Dense output layer with a single neuron, for the single output value. For regression, this layer typically has no explicit activation function, meaning it uses a linear activation by default.

Code Implementation

Let's translate this network design into Flux code. We'll assume our input data has three features, we want five neurons in the hidden layer, and one output neuron.

using Flux

# Define the number of input features, hidden units, and output units
input_features = 3
hidden_units = 5
output_units = 1

# Construct the model using Chain
model = Chain(
    Dense(input_features, hidden_units, relu),  # Hidden layer with ReLU activation
    Dense(hidden_units, output_units)           # Output layer (linear activation by default)
)

In this model, data will first pass through a Dense layer transforming 3 input features to 5 hidden features. The relu activation function is then applied element-wise to the output of this layer. The resulting 5 values then pass through another Dense layer, which transforms them into a single output value.

Visualizing the Network Structure

It's often helpful to visualize the network. While Flux doesn't have a built-in visualizer for Chain objects directly in the REPL, we can represent its structure as a diagram.

A simple neural network with one hidden layer. Data flows from input features, through a hidden layer with ReLU activation, to an output layer producing a single value.

Forward Pass: Making Predictions

Once the model is defined, you can pass data through it to get predictions. The input data should match the expected input dimensions. Flux Dense layers, by convention, expect input data where each column is a sample and rows represent features if you are processing a batch. For a single sample provided as a vector, it's treated as a single column.

Let's create some dummy input data:

# A single data point with 3 features (as a column vector)
single_input_data = rand(Float32, input_features, 1) 
# Output: 3×1 Matrix{Float32}

# Pass the data through the model
prediction = model(single_input_data)
println("Prediction for single input: ", prediction)
# Output: Prediction for single input: Float32[...]] (a 1x1 Matrix)

# A batch of 10 data points
batch_input_data = rand(Float32, input_features, 10) # 3 features, 10 samples
batch_predictions = model(batch_input_data)
println("Shape of batch predictions: ", size(batch_predictions))
# Output: Shape of batch predictions: (1, 10)

The model processes both a single data point (as a $3 \times 1$ matrix) and a batch of data (a $3 \times 10$ matrix). The output shape reflects this: a $1 \times 1$ matrix for a single input, and a $1 \times 10$ matrix for a batch of 10 inputs, where each column in the output corresponds to the prediction for the respective input sample.

Model Parameters

Flux models contain learnable parameters, which are the weights and biases of the layers. You can inspect these using Flux.params. These parameters are automatically collected from all layers within constructs like Chain.

# Get all parameters (weights and biases) of the model
parameters = Flux.params(model)

# You can iterate through them or inspect specific ones
# For example, to see the number of parameter arrays:
println("Number of parameter arrays: ", length(parameters)) 
# Expected Output: Number of parameter arrays: 4 
# (These correspond to: weights_hidden, bias_hidden, weights_output, bias_output)

# To see the dimensions of the first parameter array (weights of the first Dense layer):
if !isempty(parameters)
    println("Shape of weights for the first Dense layer: ", size(first(parameters)))
    # Expected Output: Shape of weights for the first Dense layer: (5, 3)
end

These parameters are what get updated during the training process. The optimizer modifies them based on the gradients calculated with respect to the loss function.

Next Steps: Training the Network

Constructing the model is the initial step. To make it perform a useful task, it needs to be trained. The training process generally involves:

Choosing a Loss Function: Select an appropriate loss function (e.g., Flux.mse for regression, Flux.logitcrossentropy for binary classification) to measure the difference between the model's predictions and the true target values.
Selecting an Optimizer: Choose an optimizer (e.g., ADAM(), Descent()) that will specify how to adjust the model's parameters to minimize this loss.
Iterative Training: Repeatedly feed data to the model, calculate the loss, compute gradients of the loss with respect to the model parameters, and then use the optimizer to update the parameters. This iterative process is known as the training loop.

Zygote.jl plays an indispensable role here by automatically computing the gradients required for training. When you define a loss function that incorporates your model and the data, Zygote can differentiate this function with respect to Flux.params(model). We touched upon Zygote.jl earlier, and you will see it integrated into a full training loop in the "Hands-on Practical: A Simple Regressor with Flux" section that follows this chapter's theoretical discussions.

Was this section helpful?