Flux.jl provides a highly composable system for building neural networks. Instead of rigid, predefined architectures, you assemble your networks from fundamental components. The primary building blocks you'll encounter are layers, chains, and the overall model structure, which can be a simple chain or a more intricate custom design. Mastering these elements is your first step to defining any neural network in Flux.
At the most basic level, a Flux model is composed of layers. A layer is essentially a function that performs a transformation on its input data. Crucially, most layers also possess learnable parameters, typically weights and biases, which are adjusted during the training process. Flux automatically tracks these parameters for you.
The most common and perhaps simplest layer is the Dense
layer, also known as a fully connected layer. It applies a linear transformation to the input, followed by an optional activation function.
A Dense
layer is defined by the number of input features it accepts and the number of output features it produces. For instance, Dense(10, 5)
creates a layer that takes an input vector (or a batch of vectors) where each vector has 10 features, and transforms it into an output vector of 5 features.
using Flux
# Create a Dense layer: 10 input features, 5 output features
# An activation function can also be specified, e.g., Dense(10, 5, relu)
# We'll discuss activation functions in detail in the next section.
input_features = 10
output_features = 5
dense_layer = Dense(input_features, output_features)
# Let's generate some dummy input data: a single column vector with 10 Float32 elements
dummy_input = randn(Float32, input_features, 1)
# Pass the input through the layer
output = dense_layer(dummy_input)
println("Input dimensions: ", size(dummy_input))
println("Output dimensions: ", size(output))
# Expected output:
# Input dimensions: (10, 1)
# Output dimensions: (5, 1)
Internally, this dense_layer
holds a weight matrix W of size (output_features, input_features)
and a bias vector b of size (output_features, 1)
. When you pass input x through it, the layer computes W⋅x+b. If an activation function σ was specified (like relu
), the computation would be σ(W⋅x+b). These parameters W and b are what Flux will optimize during training.
While Dense
is fundamental, Flux offers many other layer types for different tasks, such as:
Conv
and MaxPool
for convolutional neural networks (CNNs).RNN
, LSTM
, and GRU
cells for recurrent neural networks (RNNs).Embedding
layers for handling categorical data or word embeddings.We will explore these specialized layers in subsequent chapters. For now, understand that a layer is a callable object (meaning you can use it like a function: layer(input)
) that transforms data and typically manages its own learnable parameters.
Often, a neural network involves a sequence of layers where the output of one layer becomes the input to the next. Flux provides a convenient way to construct such sequential models using Chain
. A Chain
takes a series_of_layers and applies them in order to the input data.
using Flux
# Define a simple sequential model using Chain
# Input -> Dense(10->20) -> relu activation -> Dense(20->5) -> softmax activation
model_chain = Chain(
Dense(10, 20), # First layer: 10 inputs, 20 outputs
relu, # Activation function (applied element-wise)
Dense(20, 5), # Second layer: 20 inputs, 5 outputs
softmax # Output activation (e.g., for classification)
)
# Generate dummy input: batch of 3 samples, each with 10 features
dummy_batch_input = randn(Float32, 10, 3)
# Pass the input through the entire chain
predictions = model_chain(dummy_batch_input)
println("Model output dimensions: ", size(predictions))
# Expected output:
# Model output dimensions: (5, 3)
# (5 output features for each of the 3 samples in the batch)
In this example, model_chain
will first pass the dummy_batch_input
through the Dense(10, 20)
layer. The output of this layer (which will have 20 features) is then passed to the relu
activation function. The result of relu
feeds into Dense(20, 5)
, and finally, the output of this second dense layer is passed through softmax
.
The Chain
itself is also a callable object, just like individual layers. It neatly packages a sequence of operations into a single, reusable component. Activation functions like relu
and softmax
are also treated as layers within a Chain
; they are simple functions that transform their input without learnable parameters of their own. We will cover activation functions in more detail in the next section.
Here is a diagram illustrating the data flow through a typical Chain
:
A
Chain
processes data sequentially. Input passes through an initialDense
layer and its activation, followed by anotherDense
layer and its activation, finally producing the output. EachDense
layer contains its own learnable weights and biases.
In Flux, the term "model" often refers to any callable structure that processes input and produces output, potentially containing learnable parameters.
For many common neural network architectures, particularly feed-forward networks like Multilayer Perceptrons (MLPs), a Chain
is your model. It encapsulates the entire network structure.
However, Flux's true power comes from its flexibility. You are not limited to Chain
for defining models. For more complex architectures that don't follow a simple sequential flow, such as networks with skip connections (like ResNets), multiple input or output branches, or other custom routing logic, you can define your model as a custom Julia struct
.
A custom model struct
typically holds various layers (which could be Dense
, Conv
, or even other Chain
s) as its fields. To make it a functional model, you then define how it processes input data by making the struct
callable. This is done by implementing a method for your struct
that takes an input x
and defines the forward pass.
Here's a very simple illustration:
using Flux
# Define a custom struct for our model
struct MyCustomModel
layer1::Dense
layer2::Dense
# You could add other layers, chains, or even non-Flux components
end
# Make the struct callable to define the forward pass
# This function defines how input 'x' flows through the model's components.
# Here, we apply layer1, then relu activation, then layer2.
(m::MyCustomModel)(x) = m.layer2(relu.(m.layer1(x)))
# Instantiate the custom model
custom_model = MyCustomModel(
Dense(10, 20), # layer1: 10 inputs, 20 outputs
Dense(20, 5) # layer2: 20 inputs, 5 outputs
)
# Use the custom model like any other Flux layer or chain
dummy_input = randn(Float32, 10, 1) # 10 features, 1 sample
output = custom_model(dummy_input)
println("Custom model output dimensions: ", size(output))
# Expected output:
# Custom model output dimensions: (5, 1)
This custom struct approach allows for arbitrary complexity. The forward pass (m::MyCustomModel)(x)
can implement any logic you need, calling its constituent layers in any order, combining their outputs, etc. Flux will still be able to find and train the parameters of the layers (like m.layer1
and m.layer2
) contained within your custom model.
These primitives, layers as the fundamental computational units, Chain
for straightforward sequential composition, and custom structs for bespoke architectures, provide a versatile and powerful toolkit. They allow you to express a range of neural network designs in Julia with clarity and efficiency. As you progress, you'll see how these basic components are combined to build sophisticated deep learning models.
Was this section helpful?
© 2025 ApX Machine Learning