We've explored the individual building blocks of Flux.jl: layers like Dense
, the Chain
constructor for sequencing operations, activation functions for non-linearity, loss functions to quantify error, and optimizers to guide learning. Now, we'll integrate these components to construct a complete, albeit basic, neural network. This process forms the foundation for building more complex deep learning models.
The most straightforward way to define a feedforward neural network in Flux is using Chain
. A Chain
takes a sequence of layers, or any callable functions, and applies them one after another to the input data.
Consider a simple regression task: predicting a single continuous value from, say, three input features. We might design a network with one hidden layer. This network would consist of:
Dense
layer.Dense
hidden layer with a chosen number of neurons (e.g., five) and an activation function (e.g., relu
).Dense
output layer with a single neuron, for the single output value. For regression, this layer typically has no explicit activation function, meaning it uses a linear activation by default.Let's translate this network design into Flux code. We'll assume our input data has three features, we want five neurons in the hidden layer, and one output neuron.
using Flux
# Define the number of input features, hidden units, and output units
input_features = 3
hidden_units = 5
output_units = 1
# Construct the model using Chain
model = Chain(
Dense(input_features, hidden_units, relu), # Hidden layer with ReLU activation
Dense(hidden_units, output_units) # Output layer (linear activation by default)
)
In this model
, data will first pass through a Dense
layer transforming 3 input features to 5 hidden features. The relu
activation function is then applied element-wise to the output of this layer. The resulting 5 values then pass through another Dense
layer, which transforms them into a single output value.
It's often helpful to visualize the network. While Flux doesn't have a built-in visualizer for Chain
objects directly in the REPL, we can represent its structure as a diagram.
A simple neural network with one hidden layer. Data flows from input features, through a hidden layer with ReLU activation, to an output layer producing a single value.
Once the model is defined, you can pass data through it to get predictions. The input data should match the expected input dimensions. Flux Dense
layers, by convention, expect input data where each column is a sample and rows represent features if you are processing a batch. For a single sample provided as a vector, it's treated as a single column.
Let's create some dummy input data:
# A single data point with 3 features (as a column vector)
single_input_data = rand(Float32, input_features, 1)
# Output: 3×1 Matrix{Float32}
# Pass the data through the model
prediction = model(single_input_data)
println("Prediction for single input: ", prediction)
# Output: Prediction for single input: Float32[...]] (a 1x1 Matrix)
# A batch of 10 data points
batch_input_data = rand(Float32, input_features, 10) # 3 features, 10 samples
batch_predictions = model(batch_input_data)
println("Shape of batch predictions: ", size(batch_predictions))
# Output: Shape of batch predictions: (1, 10)
The model processes both a single data point (as a 3×1 matrix) and a batch of data (a 3×10 matrix). The output shape reflects this: a 1×1 matrix for a single input, and a 1×10 matrix for a batch of 10 inputs, where each column in the output corresponds to the prediction for the respective input sample.
Flux models contain learnable parameters, which are the weights and biases of the layers. You can inspect these using Flux.params
. These parameters are automatically collected from all layers within constructs like Chain
.
# Get all parameters (weights and biases) of the model
parameters = Flux.params(model)
# You can iterate through them or inspect specific ones
# For example, to see the number of parameter arrays:
println("Number of parameter arrays: ", length(parameters))
# Expected Output: Number of parameter arrays: 4
# (These correspond to: weights_hidden, bias_hidden, weights_output, bias_output)
# To see the dimensions of the first parameter array (weights of the first Dense layer):
if !isempty(parameters)
println("Shape of weights for the first Dense layer: ", size(first(parameters)))
# Expected Output: Shape of weights for the first Dense layer: (5, 3)
end
These parameters are what get updated during the training process. The optimizer modifies them based on the gradients calculated with respect to the loss function.
Constructing the model is the initial step. To make it perform a useful task, it needs to be trained. The training process generally involves:
Flux.mse
for regression, Flux.logitcrossentropy
for binary classification) to measure the difference between the model's predictions and the true target values.ADAM()
, Descent()
) that will specify how to adjust the model's parameters to minimize this loss.Zygote.jl plays an indispensable role here by automatically computing the gradients required for training. When you define a loss function that incorporates your model and the data, Zygote can differentiate this function with respect to Flux.params(model)
. We touched upon Zygote.jl earlier, and you will see it integrated into a full training loop in the "Hands-on Practical: A Simple Regressor with Flux" section that follows this chapter's theoretical discussions.
Was this section helpful?
© 2025 ApX Machine Learning