All Courses

Defining Simple Neural Network Layers

Neural network layers are the fundamental computational units within Flux.jl, acting as transformers that process input data and pass it to subsequent parts of the network. Each layer performs a specific operation, and by composing these layers, we build complex deep learning models. In this section, we'll focus on defining and using one of the most common and foundational types: the simple, fully connected layer, known in Flux.jl as Dense.

The `Dense` Layer: A Fully Connected Transformation

A Dense layer, often called a fully connected layer or a linear layer in other frameworks, is a core component in many neural network architectures. It connects every input neuron to every output neuron. Each connection has an associated weight, and each output neuron typically has an associated bias term.

Mathematically, a Dense layer performs an affine transformation on its input $x$ :

y = \sigma(Wx + b)

Here, $W$ represents the weight matrix, $b$ is the bias vector, and $x$ is the input. The function $\sigma$ (sigma) is an activation function, which introduces non-linearity into the model. We'll cover activation functions in detail in the next section. For now, understand that a Dense layer can either include an activation function or output the raw result of $Wx + b$ .

In Flux.jl, you create a Dense layer by specifying the number of input features it accepts and the number of output features it produces. You can also provide an activation function directly.

using Flux

# A Dense layer that takes 10 input features and produces 5 output features.
# No activation function is specified, so it defaults to identity (i.e., σ(x) = x).
layer_linear = Dense(10, 5)

# A Dense layer with 10 input features, 5 output features, and ReLU activation.
layer_with_relu = Dense(10, 5, relu)

The first argument to Dense is the input dimension (number of features of the input data), and the second is the output dimension (number of features this layer will output). The optional third argument is the activation function. If omitted, Flux.jl uses identity, meaning the layer performs a purely linear transformation: $y = Wx + b$ .

Applying the Layer to Data

Flux.jl layers are callable objects. This means you can apply a layer to input data as if the layer itself were a function.

Let's see how to pass data through our layer_linear:

# Create some random input data.
# Assume a batch of 3 samples, each with 10 features.
# Flux generally expects data in (features, batch_size) columns.
input_data = rand(Float32, 10, 3) # 10 features, 3 samples

# Pass the input data through the layer
output_data = layer_linear(input_data)

println("Input data dimensions: ", size(input_data))
println("Output data dimensions: ", size(output_data))

This would produce:

Input data dimensions: (10, 3)
Output data dimensions: (5, 3)

As expected, the Dense(10, 5) layer transformed our input data from 10 features per sample to 5 features per sample, while preserving the batch size of 3.

Under the Hood: Weights and Biases

When you create a Dense layer, Flux.jl automatically initializes its weights ( $W$ ) and biases ( $b$ ). By default, weights are initialized using the Glorot uniform distribution (also known as Xavier uniform initialization), a common practice that helps with training stability. Biases are typically initialized to zeros.

These weights and biases are the learnable parameters of the layer. During the training process, an optimizer will adjust these parameters to minimize the model's error.

You can inspect the parameters of a layer (or an entire model) using Flux.params:

# Get the parameters of layer_linear
parameters = Flux.params(layer_linear)

println("Number of parameter arrays: ", length(parameters)) # Should be 2 (weights and biases)

# Inspect the dimensions of the weights and biases
# The first parameter is usually the weight matrix W, second is bias vector b
W = parameters[1]
b = parameters[2]

println("Weight matrix (W) dimensions: ", size(W)) # (out_features, in_features) -> (5, 10)
println("Bias vector (b) dimensions: ", size(b))   # (out_features, 1) -> (5, 1)

The weight matrix $W$ will have dimensions (out_features, in_features), and the bias vector $b$ will have dimensions (out_features, 1). These parameters are stored internally within the Dense layer struct and are automatically tracked by Flux's automatic differentiation system, Zygote.jl, which is essential for gradient-based optimization.

Visualizing a `Dense` Layer

The operation of a Dense layer can be visualized as a set of input nodes fully connected to a set of output nodes. Each connection has a weight, and each output node has a bias and applies an activation function.

Each input neuron $x_i$ is connected to every output neuron $y_j$ . The output $y_j$ is computed by taking a weighted sum of all inputs, adding a bias $b_j$ , and then applying an activation function $\sigma$ .

Why "Simple" Layers?

We refer to Dense layers as "simple" primarily because of their straightforward, fully connected structure. They treat all input features uniformly and don't assume any specific structure in the input data, such as spatial locality (for images) or sequential order (for time series or text).

Flux.jl provides a rich set of layers tailored for more specific data types and tasks:

Convolutional layers (Conv) for processing grid-like data such as images.
Recurrent layers (RNN, LSTM, GRU) for handling sequential data.
Pooling layers (MaxPool, MeanPool) often used in conjunction with convolutional layers.

These more specialized layers are also fundamental building blocks, but Dense layers serve as an excellent starting point and are frequently used as components within larger, more complex architectures, often as the final classification or regression stages.

Understanding how to define, use, and inspect Dense layers provides a solid foundation. In the following sections, we'll explore activation functions in more depth and learn how to combine multiple layers into a coherent model using Chain.

Was this section helpful?