Neural network layers are the fundamental computational units within Flux.jl, acting as transformers that process input data and pass it to subsequent parts of the network. Each layer performs a specific operation, and by composing these layers, we build complex deep learning models. In this section, we'll focus on defining and using one of the most common and foundational types: the simple, fully connected layer, known in Flux.jl as Dense
.
Dense
Layer: A Fully Connected TransformationA Dense
layer, often called a fully connected layer or a linear layer in other frameworks, is a core component in many neural network architectures. It connects every input neuron to every output neuron. Each connection has an associated weight, and each output neuron typically has an associated bias term.
Mathematically, a Dense
layer performs an affine transformation on its input x:
Here, W represents the weight matrix, b is the bias vector, and x is the input. The function σ (sigma) is an activation function, which introduces non-linearity into the model. We'll cover activation functions in detail in the next section. For now, understand that a Dense
layer can either include an activation function or output the raw result of Wx+b.
In Flux.jl, you create a Dense
layer by specifying the number of input features it accepts and the number of output features it produces. You can also provide an activation function directly.
using Flux
# A Dense layer that takes 10 input features and produces 5 output features.
# No activation function is specified, so it defaults to identity (i.e., σ(x) = x).
layer_linear = Dense(10, 5)
# A Dense layer with 10 input features, 5 output features, and ReLU activation.
layer_with_relu = Dense(10, 5, relu)
The first argument to Dense
is the input dimension (number of features of the input data), and the second is the output dimension (number of features this layer will output). The optional third argument is the activation function. If omitted, Flux.jl uses identity
, meaning the layer performs a purely linear transformation: y=Wx+b.
Flux.jl layers are callable objects. This means you can apply a layer to input data as if the layer itself were a function.
Let's see how to pass data through our layer_linear
:
# Create some random input data.
# Assume a batch of 3 samples, each with 10 features.
# Flux generally expects data in (features, batch_size) columns.
input_data = rand(Float32, 10, 3) # 10 features, 3 samples
# Pass the input data through the layer
output_data = layer_linear(input_data)
println("Input data dimensions: ", size(input_data))
println("Output data dimensions: ", size(output_data))
This would produce:
Input data dimensions: (10, 3)
Output data dimensions: (5, 3)
As expected, the Dense(10, 5)
layer transformed our input data from 10 features per sample to 5 features per sample, while preserving the batch size of 3.
When you create a Dense
layer, Flux.jl automatically initializes its weights (W) and biases (b). By default, weights are initialized using the Glorot uniform distribution (also known as Xavier uniform initialization), a common practice that helps with training stability. Biases are typically initialized to zeros.
These weights and biases are the learnable parameters of the layer. During the training process, an optimizer will adjust these parameters to minimize the model's error.
You can inspect the parameters of a layer (or an entire model) using Flux.params
:
# Get the parameters of layer_linear
parameters = Flux.params(layer_linear)
println("Number of parameter arrays: ", length(parameters)) # Should be 2 (weights and biases)
# Inspect the dimensions of the weights and biases
# The first parameter is usually the weight matrix W, second is bias vector b
W = parameters[1]
b = parameters[2]
println("Weight matrix (W) dimensions: ", size(W)) # (out_features, in_features) -> (5, 10)
println("Bias vector (b) dimensions: ", size(b)) # (out_features, 1) -> (5, 1)
The weight matrix W will have dimensions (out_features, in_features)
, and the bias vector b will have dimensions (out_features, 1)
. These parameters are stored internally within the Dense
layer struct and are automatically tracked by Flux's automatic differentiation system, Zygote.jl, which is essential for gradient-based optimization.
Dense
LayerThe operation of a Dense
layer can be visualized as a set of input nodes fully connected to a set of output nodes. Each connection has a weight, and each output node has a bias and applies an activation function.
Each input neuron xi is connected to every output neuron yj. The output yj is computed by taking a weighted sum of all inputs, adding a bias bj, and then applying an activation function σ.
We refer to Dense
layers as "simple" primarily because of their straightforward, fully connected structure. They treat all input features uniformly and don't assume any specific structure in the input data, such as spatial locality (for images) or sequential order (for time series or text).
Flux.jl provides a rich set of layers tailored for more specific data types and tasks:
Conv
) for processing grid-like data such as images.RNN
, LSTM
, GRU
) for handling sequential data.MaxPool
, MeanPool
) often used in conjunction with convolutional layers.These more specialized layers are also fundamental building blocks, but Dense
layers serve as an excellent starting point and are frequently used as components within larger, more complex architectures, often as the final classification or regression stages.
Understanding how to define, use, and inspect Dense
layers provides a solid foundation. In the following sections, we'll explore activation functions in more depth and learn how to combine multiple layers into a coherent model using Chain
.
Was this section helpful?
© 2025 ApX Machine Learning