requires_grad
)backward()
).grad
)torch.nn
torch.nn.Module
Base Classtorch.nn
losses)torch.optim
)torch.utils.data.Dataset
torchvision.transforms
)torch.utils.data.DataLoader
While the torch.nn.Sequential
container provides a convenient way to stack layers linearly, many real-world network architectures require more intricate designs. You might need skip connections (like in ResNets), multiple input/output paths, or layers used in non-sequential orders. This is where defining your own custom network architecture by subclassing torch.nn.Module
becomes essential. It offers maximum flexibility in specifying how data flows through your model.
The fundamental process involves two main steps:
__init__
): Create a Python class that inherits from torch.nn.Module
. Inside its __init__
method, you must first call the parent class's constructor (super().__init__()
). Then, instantiate all the layers your network will need (e.g., nn.Linear
, nn.Conv2d
, nn.ReLU
, etc.) and assign them as attributes of the class instance (using self
). These layers become submodules of your custom module.forward
Method: Implement the forward
method for your class. This method takes the input tensor(s) as arguments and defines how the input data propagates through the layers you defined in __init__
. The output of this method is the final output of your network for the given input. PyTorch's Autograd system automatically builds the computation graph based on the operations performed within this forward
method.Let's start with a basic example: a simple linear regression model implemented as a custom module.
import torch
import torch.nn as nn
class SimpleLinearModel(nn.Module):
def __init__(self, input_features, output_features):
# Call the parent class constructor
super().__init__()
# Define the single linear layer
self.linear_layer = nn.Linear(input_features, output_features)
print(f"Initialized SimpleLinearModel with input_features={input_features}, output_features={output_features}")
print(f"Layer defined: {self.linear_layer}")
def forward(self, x):
# Define the forward pass: pass input through the linear layer
print(f"Forward pass input shape: {x.shape}")
output = self.linear_layer(x)
print(f"Forward pass output shape: {output.shape}")
return output
# --- Usage Example ---
# Define input and output dimensions
in_dim = 10
out_dim = 1
# Instantiate the custom model
model = SimpleLinearModel(input_features=in_dim, output_features=out_dim)
# Create some dummy input data (batch_size=5, features=10)
dummy_input = torch.randn(5, in_dim)
print(f"\nDummy input tensor shape: {dummy_input.shape}")
# Pass the data through the model
output = model(dummy_input)
print(f"Model output tensor shape: {output.shape}")
# Inspect parameters (automatically registered)
print("\nModel Parameters:")
for name, param in model.named_parameters():
if param.requires_grad:
print(f" Name: {name}, Shape: {param.shape}")
In this example:
SimpleLinearModel
inherits from nn.Module
.__init__
calls super().__init__()
and defines self.linear_layer = nn.Linear(...)
. This layer is now a registered submodule.forward(self, x)
takes the input x
and passes it through self.linear_layer
, returning the result.PyTorch automatically tracks the parameters (weights and biases) of the nn.Linear
layer because it was assigned as an attribute within an nn.Module
subclass. We can verify this by inspecting model.parameters()
or model.named_parameters()
.
Now, let's build a slightly more complex model, a two-layer MLP with a ReLU activation function between the layers.
import torch
import torch.nn as nn
import torch.nn.functional as F # Often used for functional APIs like activation functions
class SimpleMLP(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
# Define layers
self.layer1 = nn.Linear(input_size, hidden_size)
self.activation = nn.ReLU() # Define activation as a layer
self.layer2 = nn.Linear(hidden_size, output_size)
print(f"Initialized SimpleMLP: Input={input_size}, Hidden={hidden_size}, Output={output_size}")
print(f"Layer 1: {self.layer1}")
print(f"Activation: {self.activation}")
print(f"Layer 2: {self.layer2}")
def forward(self, x):
# Define the forward pass sequence
print(f"Forward pass input shape: {x.shape}")
x = self.layer1(x)
print(f"After layer 1 shape: {x.shape}")
x = self.activation(x) # Apply ReLU activation
# Alternative: x = F.relu(x) # Using the functional API
print(f"After activation shape: {x.shape}")
x = self.layer2(x)
print(f"After layer 2 (output) shape: {x.shape}")
return x
# --- Usage Example ---
# Define dimensions
in_size = 784 # Example: Flattened 28x28 image
hidden_units = 128
out_size = 10 # Example: 10 classes for classification
# Instantiate the MLP
mlp_model = SimpleMLP(input_size=in_size, hidden_size=hidden_units, output_size=out_size)
# Create dummy input (batch_size=32)
dummy_mlp_input = torch.randn(32, in_size)
print(f"\nDummy MLP input shape: {dummy_mlp_input.shape}")
# Forward pass
mlp_output = mlp_model(dummy_mlp_input)
print(f"MLP output shape: {mlp_output.shape}")
# Inspect parameters
print("\nMLP Model Parameters:")
for name, param in mlp_model.named_parameters():
if param.requires_grad:
print(f" Name: {name}, Shape: {param.shape}")
Here, the forward
method explicitly dictates the sequence: input -> layer1
-> activation
-> layer2
-> output. Notice that activation functions like nn.ReLU
are also typically defined as layers in __init__
and called in forward
. Alternatively, you could use the functional equivalents directly in the forward
method (e.g., F.relu(x)
after importing torch.nn.functional as F
), especially for activations that don't have learnable parameters.
We can visualize the data flow defined in the forward
method of our SimpleMLP
.
Data flow through the
SimpleMLP
model as defined in itsforward
method.
nn.Module
forward
pass. nn.Sequential
is limited to strictly linear sequences of layers.__init__
and their interactions defined in forward
.nn.Module
assigned as an attribute (like self.layer1 = nn.Linear(...)
) within the __init__
method. This means model.parameters()
will correctly yield all learnable parameters (weights, biases) of all submodules, making it straightforward to pass them to an optimizer.nn.Sequential
or other custom modules), allowing you to build hierarchical and reusable components.By subclassing nn.Module
, you gain full control over your network's structure, enabling the implementation of sophisticated deep learning models tailored to specific tasks. This approach is standard practice for building anything beyond the simplest feed-forward networks.
© 2025 ApX Machine Learning