requires_grad
)backward()
).grad
)torch.nn
torch.nn.Module
Base Classtorch.nn
losses)torch.optim
)torch.utils.data.Dataset
torchvision.transforms
)torch.utils.data.DataLoader
Neural networks gain much of their representational capability from the introduction of non-linearities between layers. If we simply stacked linear transformations (like nn.Linear
layers) without any intervening functions, the entire network would collapse into a single, equivalent linear transformation. No matter how many layers deep, the network could only learn linear relationships between inputs and outputs.
Activation functions are the components that introduce these essential non-linearities. Applied element-wise to the output of a layer (often called the pre-activation or logit), they transform the values before passing them to the next layer. PyTorch provides a wide variety of activation functions within the torch.nn
module, typically used by instantiating them as layers within your model definition. Let's look at three of the most common ones: ReLU, Sigmoid, and Tanh.
The Rectified Linear Unit, or ReLU, is arguably the most popular activation function in modern deep learning, especially in convolutional neural networks. Its definition is remarkably simple: it outputs the input directly if it's positive, and outputs zero otherwise.
Mathematically, it's defined as:
ReLU(x)=max(0,x)In PyTorch, you can use nn.ReLU
:
import torch
import torch.nn as nn
# Example usage
relu_activation = nn.ReLU()
input_tensor = torch.randn(4) # Example input tensor
output_tensor = relu_activation(input_tensor)
print(f"Input: {input_tensor}")
print(f"Output after ReLU: {output_tensor}")
# Example within a simple model
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(10, 20)
self.activation = nn.ReLU()
self.layer2 = nn.Linear(20, 5)
def forward(self, x):
x = self.layer1(x)
x = self.activation(x) # Apply ReLU
x = self.layer2(x)
return x
model = SimpleNet()
The ReLU function is zero for negative inputs and linear for positive inputs.
Advantages:
Disadvantages:
The Sigmoid function, sometimes called the logistic function, squashes its input into a range between 0 and 1. It was historically popular, especially in the output layer of binary classification models where the output represents a probability.
Its mathematical form is:
σ(x)=1+e−x1In PyTorch, use nn.Sigmoid
:
import torch
import torch.nn as nn
# Example usage
sigmoid_activation = nn.Sigmoid()
input_tensor = torch.randn(4) # Example input tensor
output_tensor = sigmoid_activation(input_tensor)
print(f"Input: {input_tensor}")
print(f"Output after Sigmoid: {output_tensor}")
The Sigmoid function smoothly maps any real number to the range (0, 1).
Advantages:
Disadvantages:
Due to the vanishing gradient problem, Sigmoid is less commonly used in hidden layers of deep networks today compared to ReLU, but it remains relevant for output layers in specific tasks like binary classification or multi-label classification.
The hyperbolic tangent, or Tanh function, is mathematically related to Sigmoid but squashes its input into the range (-1, 1).
It's defined as:
tanh(x)=ex+e−xex−e−x=2σ(2x)−1In PyTorch, use nn.Tanh
:
import torch
import torch.nn as nn
# Example usage
tanh_activation = nn.Tanh()
input_tensor = torch.randn(4) # Example input tensor
output_tensor = tanh_activation(input_tensor)
print(f"Input: {input_tensor}")
print(f"Output after Tanh: {output_tensor}")
The Tanh function smoothly maps any real number to the range (-1, 1).
Advantages:
Disadvantages:
Tanh was often preferred over Sigmoid for hidden layers before the rise of ReLU, mainly because of its zero-centered output range. It's still commonly found in recurrent neural networks (RNNs) and LSTMs.
There's no single "best" activation function for all scenarios. However, some general guidelines are:
Experimentation is often necessary to find the optimal activation function for a specific architecture and dataset. In PyTorch, swapping activation functions is straightforward, usually involving changing just one line where the activation module is instantiated or called within your nn.Module
's forward
method.
© 2025 ApX Machine Learning