Multilayer Perceptrons, or MLPs, are a foundational type of neural network. They consist of one or more layers of neurons, where each neuron in a preceding layer is connected to every neuron in the subsequent layer. This dense connectivity is why they are often called "fully connected networks." While simple in concept, MLPs are powerful enough to learn complex, non-linear relationships in data, making them an excellent starting point for understanding neural network architectures. In Chapter 2, you were introduced to Flux.jl's Dense
layers and the Chain
constructor; these are the primary tools we'll use to build MLPs.
An MLP typically comprises three main types of layers: an input layer, one or more hidden layers, and an output layer.
A typical MLP architecture showing the flow of information from the input layer, through hidden layers, to the output layer. Each connection between layers is "dense," meaning all neurons from the previous layer connect to all neurons in the next.
Let's break down these components:
Building an MLP in Flux.jl is straightforward using Dense
layers and combining them into a Chain
. A Dense
layer, Dense(in::Integer, out::Integer, σ)
, creates a standard fully connected layer that transforms an input of size in
to an output of size out
, followed by an activation function σ
.
Let's construct a simple MLP. Suppose we have a dataset with 10 input features, and we want to build a network with two hidden layers: the first with 64 neurons and the second with 32 neurons. For a regression task, we'll have a single output neuron.
using Flux
# Define network dimensions
num_features = 10
num_hidden1 = 64
num_hidden2 = 32
num_outputs = 1 # For a regression task
# Construct the MLP
mlp_model = Chain(
Dense(num_features, num_hidden1, relu), # Input layer (10 features) to first hidden layer (64 neurons)
Dense(num_hidden1, num_hidden2, relu), # First hidden layer (64 neurons) to second hidden layer (32 neurons)
Dense(num_hidden2, num_outputs) # Second hidden layer (32 neurons) to output layer (1 neuron)
# No activation specified for the output layer here;
# for regression, this is common (identity activation).
)
# You can print the model to see its structure
println(mlp_model)
In this code:
Chain(...)
groups the layers sequentially. The output of one layer becomes the input to the next.Dense(num_features, num_hidden1, relu)
defines our first layer. It takes num_features
inputs, produces num_hidden1
outputs, and applies the relu
activation function.Dense
layers follow the same pattern, connecting the previous layer's output to the next layer's input.Dense(num_hidden2, num_outputs)
layer doesn't explicitly specify an activation function. By default, Flux's Dense
layer uses an identity activation (x -> x
) if none is provided, which is suitable for regression tasks. For classification, you might add sigmoid
or softmax
here, or more commonly, apply it as part of the loss function calculation or as a final step after the model.To see how data flows through this model (a "forward pass"), we can create some dummy input data. Input data for Flux models is typically expected to have features as rows and observations (samples) as columns.
# Create a batch of 5 dummy data samples, each with 10 features
batch_size = 5
dummy_data = rand(Float32, num_features, batch_size) # Shape: (10, 5)
# Pass the data through the model
predictions = mlp_model(dummy_data)
println("Input data size: ", size(dummy_data))
println("Output predictions size: ", size(predictions)) # Expected: (1, 5)
The output predictions
will be a matrix of size (1, 5)
, where each column is the regression output for the corresponding input sample.
When you pass data through an MLP like mlp_model(dummy_data)
, each Dense
layer performs two main operations:
relu
). So, the final output of the layer is A=σ(Z), where σ is the activation function.The Chain
ensures that the output A from one layer becomes the input X for the next layer in the sequence, until the final output layer is reached.
When designing an MLP, several choices influence its performance:
Network Depth and Width:
Activation Functions:
relu
is a very common choice for hidden layers. It's computationally efficient and helps mitigate the "vanishing gradient" problem that can occur in deeper networks with other activations like sigmoid
or tanh
. Alternatives like leakyrelu
or elu
can sometimes offer benefits.softplus
might be used.sigmoid
is used to squash the output to a range of (0,1), representing a probability.softmax
is used to convert the outputs into a probability distribution over the N classes, where each output is in (0,1) and all outputs sum to 1. Flux provides Flux.sigmoid
and Flux.softmax
.Data Scaling:
Strengths:
Limitations:
Multilayer Perceptrons are versatile feedforward neural networks that form a foundation of deep learning. You've now seen how to construct them in Julia using Flux.jl's Dense
layers and Chain
structure, and understand the design considerations involved.
While MLPs are powerful for certain types of problems, particularly those involving tabular data, this chapter will next introduce Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). These architectures are specifically designed to handle data with spatial and sequential structures, respectively, by incorporating specialized layers that exploit these properties. Understanding MLPs provides a solid foundation for grasping these more advanced architectures.
Was this section helpful?
© 2025 ApX Machine Learning