All Courses

Defining Custom Network Architectures

While the torch.nn.Sequential container provides a convenient way to stack layers linearly, many real-world network architectures require more intricate designs. You might need skip connections (like in ResNets), multiple input/output paths, or layers used in non-sequential orders. This is where defining your own custom network architecture by subclassing torch.nn.Module becomes essential. It offers maximum flexibility in specifying how data flows through your model.

The fundamental process involves two main steps:

Define Layers in the Constructor (__init__): Create a Python class that inherits from torch.nn.Module. Inside its __init__ method, you must first call the parent class's constructor (super().__init__()). Then, instantiate all the layers your network will need (e.g., nn.Linear, nn.Conv2d, nn.ReLU, etc.) and assign them as attributes of the class instance (using self). These layers become submodules of your custom module.
Define Data Flow in the forward Method: Implement the forward method for your class. This method takes the input tensor(s) as arguments and defines how the input data propagates through the layers you defined in __init__. The output of this method is the final output of your network for the given input. PyTorch's Autograd system automatically builds the computation graph based on the operations performed within this forward method.

Let's start with a basic example: a simple linear regression model implemented as a custom module.

import torch
import torch.nn as nn

class SimpleLinearModel(nn.Module):
    def __init__(self, input_features, output_features):
        # Call the parent class constructor
        super().__init__()
        # Define the single linear layer
        self.linear_layer = nn.Linear(input_features, output_features)
        print(f"Initialized SimpleLinearModel with input_features={input_features}, output_features={output_features}")
        print(f"Layer defined: {self.linear_layer}")

    def forward(self, x):
        # Define the forward pass: pass input through the linear layer
        print(f"Forward pass input shape: {x.shape}")
        output = self.linear_layer(x)
        print(f"Forward pass output shape: {output.shape}")
        return output

# --- Usage Example ---
# Define input and output dimensions
in_dim = 10
out_dim = 1

# Instantiate the custom model
model = SimpleLinearModel(input_features=in_dim, output_features=out_dim)

# Create some dummy input data (batch_size=5, features=10)
dummy_input = torch.randn(5, in_dim)
print(f"\nDummy input tensor shape: {dummy_input.shape}")

# Pass the data through the model
output = model(dummy_input)
print(f"Model output tensor shape: {output.shape}")

# Inspect parameters (automatically registered)
print("\nModel Parameters:")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(f"  Name: {name}, Shape: {param.shape}")

In this example:

SimpleLinearModel inherits from nn.Module.
__init__ calls super().__init__() and defines self.linear_layer = nn.Linear(...). This layer is now a registered submodule.
forward(self, x) takes the input x and passes it through self.linear_layer, returning the result.

PyTorch automatically tracks the parameters (weights and biases) of the nn.Linear layer because it was assigned as an attribute within an nn.Module subclass. We can verify this by inspecting model.parameters() or model.named_parameters().

Building a Multi-Layer Perceptron (MLP)

Now, let's build a slightly more complex model, a two-layer MLP with a ReLU activation function between the layers.

import torch
import torch.nn as nn
import torch.nn.functional as F # Often used for functional APIs like activation functions

class SimpleMLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        # Define layers
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.activation = nn.ReLU() # Define activation as a layer
        self.layer2 = nn.Linear(hidden_size, output_size)
        print(f"Initialized SimpleMLP: Input={input_size}, Hidden={hidden_size}, Output={output_size}")
        print(f"Layer 1: {self.layer1}")
        print(f"Activation: {self.activation}")
        print(f"Layer 2: {self.layer2}")

    def forward(self, x):
        # Define the forward pass sequence
        print(f"Forward pass input shape: {x.shape}")
        x = self.layer1(x)
        print(f"After layer 1 shape: {x.shape}")
        x = self.activation(x) # Apply ReLU activation
        # Alternative: x = F.relu(x) # Using the functional API
        print(f"After activation shape: {x.shape}")
        x = self.layer2(x)
        print(f"After layer 2 (output) shape: {x.shape}")
        return x

# --- Usage Example ---
# Define dimensions
in_size = 784 # Example: Flattened 28x28 image
hidden_units = 128
out_size = 10   # Example: 10 classes for classification

# Instantiate the MLP
mlp_model = SimpleMLP(input_size=in_size, hidden_size=hidden_units, output_size=out_size)

# Create dummy input (batch_size=32)
dummy_mlp_input = torch.randn(32, in_size)
print(f"\nDummy MLP input shape: {dummy_mlp_input.shape}")

# Forward pass
mlp_output = mlp_model(dummy_mlp_input)
print(f"MLP output shape: {mlp_output.shape}")

# Inspect parameters
print("\nMLP Model Parameters:")
for name, param in mlp_model.named_parameters():
     if param.requires_grad:
        print(f"  Name: {name}, Shape: {param.shape}")

Here, the forward method explicitly dictates the sequence: input -> layer1 -> activation -> layer2 -> output. Notice that activation functions like nn.ReLU are also typically defined as layers in __init__ and called in forward. Alternatively, you could use the functional equivalents directly in the forward method (e.g., F.relu(x) after importing torch.nn.functional as F), especially for activations that don't have learnable parameters.

Visualizing the MLP Structure

We can visualize the data flow defined in the forward method of our SimpleMLP.

Data flow through the SimpleMLP model as defined in its forward method.

Advantages of Subclassing `nn.Module`

Flexibility: This is the primary advantage. You can implement any architecture, including those with multiple inputs/outputs, residual connections (where input is added back to a later layer's output), shared layers, or custom operations within the forward pass. nn.Sequential is limited to strictly linear sequences of layers.
Readability and Organization: Complex architectures are often easier to understand when organized within a class structure, with layers defined in __init__ and their interactions defined in forward.
Parameter Management: PyTorch automatically discovers and registers any nn.Module assigned as an attribute (like self.layer1 = nn.Linear(...)) within the __init__ method. This means model.parameters() will correctly yield all learnable parameters (weights, biases) of all submodules, making it straightforward to pass them to an optimizer.
Nesting: Custom modules can contain other modules (including nn.Sequential or other custom modules), allowing you to build hierarchical and reusable components.

By subclassing nn.Module, you gain full control over your network's structure, enabling the implementation of sophisticated deep learning models tailored to specific tasks. This approach is standard practice for building more complex feed-forward networks.

Was this section helpful?

Defining Custom Network Architectures

Building a Multi-Layer Perceptron (MLP)

Visualizing the MLP Structure

Advantages of Subclassing nn.Module

Advantages of Subclassing `nn.Module`