Imagine a neural network as a sequence of processing stations, organized into layers. The forward propagation process is essentially the journey that your input data takes through these stations, starting from the input layer, moving through any hidden layers, and finally arriving at the output layer to produce a result. It's a one-way street: information flows strictly forward, from input to output, without looping back (at least in the feedforward networks we're discussing now).
The Layer-by-Layer Computation
The fundamental principle is that the output of one layer serves as the input for the very next layer. Let's trace this path:
- Input Layer: This isn't really a computational layer. It simply receives your initial data, often represented as a vector x. This vector x can also be considered the "activation" of the input layer, sometimes denoted as a[0].
- First Hidden Layer: Each neuron in this layer (let's call it layer 1) receives the activations from the input layer (a[0]). Inside each neuron j of layer 1, two things happen:
- A weighted sum of the inputs plus a bias is calculated: zj[1]=∑iwji[1]ai[0]+bj[1].
- An activation function g is applied to this sum: aj[1]=g(zj[1]).
The collection of all activations aj[1] from this layer forms the activation vector a[1], which becomes the input for the next layer.
- Subsequent Hidden Layers: The process repeats for any additional hidden layers (layer 2, layer 3, etc.). For a generic layer l, each neuron computes its weighted sum zj[l] based on the activations a[l−1] from the previous layer, and then applies the activation function g to get its own activation aj[l]. The vector a[l] is then passed forward.
zj[l]=∑iwji[l]ai[l−1]+bj[l]
aj[l]=g(zj[l])
- Output Layer: The final layer performs the same calculations (weighted sum and activation), but its output, a[L] (where L is the last layer), represents the network's final prediction, often denoted as y^. The choice of activation function in this layer is typically specific to the problem type (e.g., Sigmoid for binary classification, Softmax for multi-class classification, or none for regression).
This sequential processing, where the output of layer l−1 becomes the input to layer l, continues until the final output is generated.
Visualizing the Flow
Consider a simple network with an input layer, one hidden layer, and an output layer. The information moves strictly from left to right.
A simple feedforward network illustrating the flow of information. Inputs x1,x2 (activations a[0]) are processed by the hidden layer to produce activations a[1], which are then processed by the output layer to produce the final prediction y^ (activation a[2]).
Each arrow in the diagram represents a weight connecting the output of a neuron in one layer to the input calculation of a neuron in the next layer. The entire process involves calculating the values z[l] and a[l] for each layer l, starting from l=1 up to the final layer L. The result, a[L], is the prediction y^ generated by the network for the given input x.
The next sections will detail the specific calculations involved in the linear transformation (z) and the application of activation functions (a) within each layer, moving towards using matrix operations for efficiency.