As information begins its journey through the network during forward propagation, the first significant computation within each neuron is a linear transformation. This step involves calculating a weighted sum of all the inputs connected to that neuron, plus a bias term specific to the neuron. Think of it as combining the incoming signals based on their learned importance (weights) and then shifting the result by a certain amount (bias).
Recall from Chapter 1 that a single artificial neuron processes multiple inputs. For a neuron receiving n inputs x1,x2,…,xn, each input xi is associated with a corresponding weight wi. The neuron also has a bias term, b. The linear transformation combines these elements to produce an intermediate value, often denoted by z.
The calculation is performed as follows:
z=(w1x1+w2x2+⋯+wnxn)+bThis can be expressed more compactly using summation notation:
z=i=1∑n(wixi)+bLet's break down the components:
Performing these sums individually can be computationally intensive, especially in networks with many neurons and inputs. Linear algebra provides a much more efficient way to represent and compute this. If we organize the weights into a vector w and the inputs into a vector x:
w=w1w2⋮wn,x=x1x2⋮xnThen the weighted sum (excluding the bias for a moment) is simply the dot product of the weight vector and the input vector. Depending on whether you treat w and x as column vectors (common in deep learning literature), the dot product is often written using transpose notation:
wTx=[w1w2…wn]x1x2⋮xn=w1x1+w2x2+⋯+wnxnIncluding the bias, the full linear transformation for a single neuron becomes:
z=wTx+bThis vector notation is fundamental because it scales efficiently when we consider multiple neurons in a layer, which we'll see involves matrix multiplications.
Let's consider a neuron with 3 inputs and specific weights and bias:
The weighted sum z is calculated as:
z=(w1x1+w2x2+w3x3)+b z=(0.5×2.0)+(−1.2×3.0)+(0.8×−1.0)+0.1 z=(1.0)+(−3.6)+(−0.8)+0.1 z=1.0−3.6−0.8+0.1 z=−3.3
A simple illustration of the weighted sum calculation for one neuron. Inputs are multiplied by their respective weights, summed together, and then the bias is added to produce the pre-activation value z.
This linear transformation happens concurrently for every neuron within a given layer. Critically, each neuron in a layer receives the exact same input vector x (coming from the previous layer or the initial dataset). However, each neuron j in the layer has its own unique weight vector wj and its own unique bias bj.
Therefore, if a layer has m neurons, it will compute m different weighted sums, z1,z2,…,zm:
Each zj represents the aggregated input signal for neuron j before non-linearity is introduced. This collection of z values forms the input for the next step: applying the activation function across the layer. We'll explore how to perform these layer-wide calculations efficiently using matrix operations in a later section. For now, understanding this fundamental weighted sum calculation is the important first step in the forward pass.
© 2025 ApX Machine Learning