All Courses

Flux.jl Primitives: Layers, Models, and Chains

Flux.jl provides a highly composable system for building neural networks. Instead of rigid, predefined architectures, you assemble your networks from fundamental components. The primary building blocks you'll encounter are layers, chains, and the overall model structure, which can be a simple chain or a more intricate custom design. Mastering these elements is your first step to defining any neural network in Flux.

Layers: The Atomic Units of Computation

At the most basic level, a Flux model is composed of layers. A layer is essentially a function that performs a transformation on its input data. Crucially, most layers also possess learnable parameters, typically weights and biases, which are adjusted during the training process. Flux automatically tracks these parameters for you.

The most common and perhaps simplest layer is the Dense layer, also known as a fully connected layer. It applies a linear transformation to the input, followed by an optional activation function. A Dense layer is defined by the number of input features it accepts and the number of output features it produces. For instance, Dense(10, 5) creates a layer that takes an input vector (or a batch of vectors) where each vector has 10 features, and transforms it into an output vector of 5 features.

using Flux

# Create a Dense layer: 10 input features, 5 output features
# An activation function can also be specified, e.g., Dense(10, 5, relu)
# We'll discuss activation functions in detail in the next section.
input_features = 10
output_features = 5
dense_layer = Dense(input_features, output_features)

# Let's generate some dummy input data: a single column vector with 10 Float32 elements
dummy_input = randn(Float32, input_features, 1) 

# Pass the input through the layer
output = dense_layer(dummy_input)

println("Input dimensions: ", size(dummy_input))
println("Output dimensions: ", size(output))
# Expected output:
# Input dimensions: (10, 1)
# Output dimensions: (5, 1)

Internally, this dense_layer holds a weight matrix $W$ of size (output_features, input_features) and a bias vector $b$ of size (output_features, 1). When you pass input $x$ through it, the layer computes $W \cdot x + b$ . If an activation function $\sigma$ was specified (like relu), the computation would be $\sigma(W \cdot x + b)$ . These parameters $W$ and $b$ are what Flux will optimize during training.

While Dense is fundamental, Flux offers many other layer types for different tasks, such as:

Conv and MaxPool for convolutional neural networks (CNNs).
RNN, LSTM, and GRU cells for recurrent neural networks (RNNs).
Embedding layers for handling categorical data or word embeddings.

We will explore these specialized layers in subsequent chapters. For now, understand that a layer is a callable object (meaning you can use it like a function: layer(input)) that transforms data and typically manages its own learnable parameters.

Chains: Sequencing Layers for Feed-Forward Models

Often, a neural network involves a sequence of layers where the output of one layer becomes the input to the next. Flux provides a convenient way to construct such sequential models using Chain. A Chain takes a series_of_layers and applies them in order to the input data.

using Flux

# Define a simple sequential model using Chain
# Input -> Dense(10->20) -> relu activation -> Dense(20->5) -> softmax activation
model_chain = Chain(
    Dense(10, 20),   # First layer: 10 inputs, 20 outputs
    relu,            # Activation function (applied element-wise)
    Dense(20, 5),    # Second layer: 20 inputs, 5 outputs
    softmax          # Output activation (e.g., for classification)
)

# Generate dummy input: batch of 3 samples, each with 10 features
dummy_batch_input = randn(Float32, 10, 3)

# Pass the input through the entire chain
predictions = model_chain(dummy_batch_input)

println("Model output dimensions: ", size(predictions))
# Expected output:
# Model output dimensions: (5, 3) 
# (5 output features for each of the 3 samples in the batch)

In this example, model_chain will first pass the dummy_batch_input through the Dense(10, 20) layer. The output of this layer (which will have 20 features) is then passed to the relu activation function. The result of relu feeds into Dense(20, 5), and finally, the output of this second dense layer is passed through softmax.

The Chain itself is also a callable object, just like individual layers. It neatly packages a sequence of operations into a single, reusable component. Activation functions like relu and softmax are also treated as layers within a Chain; they are simple functions that transform their input without learnable parameters of their own. We will cover activation functions in more detail in the next section.

Here is a diagram illustrating the data flow through a typical Chain:

A Chain processes data sequentially. Input passes through an initial Dense layer and its activation, followed by another Dense layer and its activation, finally producing the output. Each Dense layer contains its own learnable weights and biases.

Models: From Simple Chains to Custom Architectures

In Flux, the term "model" often refers to any callable structure that processes input and produces output, potentially containing learnable parameters. For many common neural network architectures, particularly feed-forward networks like Multilayer Perceptrons (MLPs), a Chain is your model. It encapsulates the entire network structure.

However, Flux's true power comes from its flexibility. You are not limited to Chain for defining models. For more complex architectures that don't follow a simple sequential flow, such as networks with skip connections (like ResNets), multiple input or output branches, or other custom routing logic, you can define your model as a custom Julia struct.

A custom model struct typically holds various layers (which could be Dense, Conv, or even other Chains) as its fields. To make it a functional model, you then define how it processes input data by making the struct callable. This is done by implementing a method for your struct that takes an input x and defines the forward pass.

Here's a very simple illustration:

using Flux

# Define a custom struct for our model
struct MyCustomModel
    layer1::Dense
    layer2::Dense
    # You could add other layers, chains, or even non-Flux components
end

# Make the struct callable to define the forward pass
# This function defines how input 'x' flows through the model's components.
# Here, we apply layer1, then relu activation, then layer2.
(m::MyCustomModel)(x) = m.layer2(relu.(m.layer1(x)))

# Instantiate the custom model
custom_model = MyCustomModel(
    Dense(10, 20), # layer1: 10 inputs, 20 outputs
    Dense(20, 5)   # layer2: 20 inputs, 5 outputs
)

# Use the custom model like any other Flux layer or chain
dummy_input = randn(Float32, 10, 1) # 10 features, 1 sample
output = custom_model(dummy_input)

println("Custom model output dimensions: ", size(output))
# Expected output:
# Custom model output dimensions: (5, 1)

This custom struct approach allows for arbitrary complexity. The forward pass (m::MyCustomModel)(x) can implement any logic you need, calling its constituent layers in any order, combining their outputs, etc. Flux will still be able to find and train the parameters of the layers (like m.layer1 and m.layer2) contained within your custom model.

These primitives, layers as the fundamental computational units, Chain for straightforward sequential composition, and custom structs for bespoke architectures, provide a versatile and powerful toolkit. They allow you to express a range of neural network designs in Julia with clarity and efficiency. As you progress, you'll see how these basic components are combined to build sophisticated deep learning models.

Was this section helpful?