Having established the basic structure of an artificial neuron, let's focus on the elements that give it the ability to learn: its parameters. These are the adjustable parts of the network that are modified during the training process. The two primary types of parameters in a basic neuron are weights and biases.
Think of weights as representing the strength or importance of the connection between an input and the neuron. Each input connected to the neuron has an associated weight (w). When an input signal (x) arrives, it's multiplied by its corresponding weight. A larger absolute weight means the input has a stronger effect (either excitatory if positive or inhibitory if negative) on the neuron's output. Conversely, a weight close to zero means the input has little influence.
Consider a neuron with three inputs, x1, x2, and x3. Each input will have its own weight: w1, w2, and w3. The first step in the neuron's calculation involves computing the weighted sum of its inputs: w1x1+w2x2+w3x3.
These weights are not fixed; they are initialized with some values (often small random numbers) and then adjusted iteratively during the network's training phase based on the errors the network makes. This adjustment process is how the network learns to pay more attention to relevant inputs and less to irrelevant ones for a specific task.
Inputs (x1,x2,...) are multiplied by their weights (w1,w2,...) and summed (Σ). The bias (b) is then added to this sum.
After calculating the weighted sum of inputs, another parameter comes into play: the bias (b). The bias term is added to the weighted sum: w1x1+w2x2+w3x3+b.
What does the bias do? You can think of it as adjusting the neuron's intrinsic tendency to activate. It provides a constant offset to the sum, independent of the inputs. A positive bias makes it easier for the neuron to output a high value (activate more readily), while a negative bias makes it harder. Without a bias term, the neuron's weighted sum would have to reach a certain level on its own based purely on the inputs and weights. The bias gives the network more flexibility by effectively shifting the activation function's operating point. Imagine it as setting a baseline level of activity.
Like weights, the bias is a learnable parameter. Each neuron typically has its own bias value (though sometimes biases are omitted or handled differently in specific architectures). It's adjusted during training alongside the weights to help the network fit the data better.
Putting it together, the calculation performed before applying the activation function is a linear combination of the inputs, weights, and the bias. This value, often denoted by z, represents the neuron's raw activation potential:
z=(w1x1+w2x2+⋯+wnxn)+bThis can be written more compactly using summation notation for a neuron with n inputs:
z=i=1∑n(wixi)+bFor those familiar with linear algebra, this is even more efficiently represented using vector notation. If w is the vector of weights [w1,w2,…,wn] and x is the vector of inputs [x1,x2,…,xn], the calculation becomes a dot product plus the bias:
z=w⋅x+bThis value z is sometimes called the logit, pre-activation, or net input. It represents the neuron's aggregated input signal before it's passed through the non-linear activation function (which we'll discuss next).
Weights and biases are the core components that store the learned knowledge within a neural network. When we talk about "training" a network, we essentially mean finding the optimal set of values for all the weights and biases across all neurons. This optimization process aims to minimize the difference between the network's predictions and the actual target values in the training data.
The network typically starts with random initial values for these parameters. Then, through algorithms like backpropagation and gradient descent (covered in Chapter 4), it iteratively adjusts the weights and biases based on the errors it makes. A positive error might lead to adjustments that increase certain weights or the bias, while a negative error might decrease them, guiding the network towards better performance.
The sheer number of these parameters in modern deep learning models (often millions or billions) is what allows them to capture incredibly complex patterns and relationships within data. Understanding the individual roles of weights and biases is fundamental to appreciating how these networks function and learn. The next critical step is seeing how the calculated value z is transformed by an activation function to introduce non-linearity and produce the neuron's final output signal.
© 2025 ApX Machine Learning