You'll implement basic versions of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) using PyTorch's nn.Module and relevant layers. This hands-on experience will solidify your understanding of how these models are constructed and how data flows through them.We'll focus on defining the model structure and understanding the input/output dimensions, building directly on the nn.Module concepts from Chapter 4 and the layer descriptions from earlier in this chapter. Remember, these are simplified examples; integrating them into a full training loop would involve adding data loading (Chapter 5), loss functions, optimizers, and the training logic (Chapter 6).Implementing a Basic CNNCNNs excel at processing grid-like data, such as images. Let's build a simple CNN that could be used for image classification. We'll define a network with convolutional layers, activation functions, pooling layers, and a final fully connected layer.Defining the CNN ArchitectureWe create a class inheriting from nn.Module. Inside __init__, we define the layers we need: nn.Conv2d for convolution, nn.ReLU for activation, nn.MaxPool2d for pooling, and nn.Linear for the final classification layer. The forward method defines how input data flows through these layers.import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self, num_classes=10): super(SimpleCNN, self).__init__() # Input shape: (Batch, 1, 28, 28) - assuming grayscale images like MNIST self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1) # Shape after conv1: (Batch, 16, 28, 28) -> (28 - 3 + 2*1)/1 + 1 = 28 self.relu1 = nn.ReLU() self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # Shape after pool1: (Batch, 16, 14, 14) -> 28 / 2 = 14 self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1) # Shape after conv2: (Batch, 32, 14, 14) -> (14 - 3 + 2*1)/1 + 1 = 14 self.relu2 = nn.ReLU() self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) # Shape after pool2: (Batch, 32, 7, 7) -> 14 / 2 = 7 # Flatten the output for the linear layer # Flattened size = 32 * 7 * 7 = 1568 self.fc = nn.Linear(32 * 7 * 7, num_classes) def forward(self, x): # Apply first convolutional block out = self.conv1(x) out = self.relu1(out) out = self.pool1(out) # Apply second convolutional block out = self.conv2(out) out = self.relu2(out) out = self.pool2(out) # Flatten the output from the convolutional layers # -1 infers the batch size out = out.view(out.size(0), -1) # Apply the fully connected layer out = self.fc(out) return out In this example:We assume a grayscale input image (1 channel), like those in the MNIST dataset, with dimensions 28x28.nn.Conv2d(in_channels=1, out_channels=16, ...): Takes 1 input channel, applies 16 filters. kernel_size=3, stride=1, padding=1 are common choices that preserve the spatial dimensions after convolution.nn.MaxPool2d(kernel_size=2, stride=2): Reduces the height and width by half.The output of the second pooling layer has shape (Batch, 32, 7, 7).out.view(out.size(0), -1): Flattens the tensor from shape (Batch, 32, 7, 7) to (Batch, 32 * 7 * 7) = (Batch, 1568) so it can be fed into the linear layer.nn.Linear(32 * 7 * 7, num_classes): The final layer maps the flattened features to the desired number of output classes.Testing the CNN with Dummy DataLet's create some dummy input data matching the expected shape (Batch Size, Channels, Height, Width) and pass it through our network to see the output shape.# Instantiate the model cnn_model = SimpleCNN(num_classes=10) # Create a dummy input batch (e.g., 4 images, 1 channel, 28x28 pixels) # Requires_grad=False as we are just doing a forward pass demonstration dummy_input_cnn = torch.randn(4, 1, 28, 28, requires_grad=False) # Perform a forward pass output_cnn = cnn_model(dummy_input_cnn) # Print input and output shapes print(f"Input shape: {dummy_input_cnn.shape}") print(f"Output shape: {output_cnn.shape}")Running this should output:Input shape: torch.Size([4, 1, 28, 28]) Output shape: torch.Size([4, 10])This confirms our network takes a batch of 4 images and outputs predictions for 10 classes for each image. Notice how the forward method dictates the data flow, and how we need to calculate the flattened size for the linear layer based on the output shape of the final pooling layer. You can revisit the section "Understanding Input/Output Shapes for CNN Layers" to practice calculating these dimensions manually.Implementing a Basic RNNRNNs are designed for sequential data. Let's build a simple RNN that could, for example, process sequences of characters or sensor readings.Defining the RNN ArchitectureWe'll use the nn.RNN layer. Remember that RNN layers expect input in the format (Sequence Length, Batch Size, Input Features).import torch import torch.nn as nn class SimpleRNN(nn.Module): def __init__(self, input_size, hidden_size, output_size, num_layers=1): super(SimpleRNN, self).__init__() self.hidden_size = hidden_size self.num_layers = num_layers # RNN layer # batch_first=False by default, expects input: (Seq_len, Batch, Input_feature) self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=False) # Fully connected layer to map RNN output to final output size self.fc = nn.Linear(hidden_size, output_size) def forward(self, x, h0=None): # x shape: (Seq_len, Batch, Input_feature) # Initialize hidden state if not provided # Shape: (Num_layers * Num_directions, Batch, Hidden_size) if h0 is None: h0 = torch.zeros(self.num_layers, x.size(1), self.hidden_size).to(x.device) # Pass data through RNN layer # out shape: (Seq_len, Batch, Hidden_size) -> contains output features for each time step # hn shape: (Num_layers * Num_directions, Batch, Hidden_size) -> contains final hidden state out, hn = self.rnn(x, h0) # We can choose to use the output of the last time step # out[-1] shape: (Batch, Hidden_size) # Alternatively, process the entire sequence 'out' if needed out_last_step = out[-1, :, :] # Pass the output of the last time step through the linear layer final_output = self.fc(out_last_step) # final_output shape: (Batch, Output_size) return final_output, hn # Return final output and last hidden stateIn this example:input_size: The number of features at each step in the sequence.hidden_size: The number of features in the hidden state.num_layers: The number of stacked RNN layers.nn.RNN(...): The core RNN layer. batch_first=False is the default, meaning the sequence length dimension comes first.The forward method takes the input sequence x and an optional initial hidden state h0. If h0 is not provided, it's initialized to zeros.The nn.RNN layer returns out (outputs for every time step) and hn (the final hidden state).We often use the output from the last time step (out[-1, :, :]) for sequence classification or prediction tasks, passing it through a final linear layer.Testing the RNN with Dummy DataLet's create a dummy sequence and pass it through our RNN.# Define parameters input_features = 10 # e.g., embedding dimension for characters/words hidden_nodes = 20 output_classes = 5 # e.g., predict one of 5 categories based on the sequence sequence_length = 15 batch_size = 4 # Instantiate the model rnn_model = SimpleRNN(input_size=input_features, hidden_size=hidden_nodes, output_size=output_classes) # Create a dummy input batch (Sequence Length, Batch Size, Input Features) # Requires_grad=False for demonstration dummy_input_rnn = torch.randn(sequence_length, batch_size, input_features, requires_grad=False) # Perform a forward pass (without providing h0, it will be initialized) output_rnn, final_hidden_state = rnn_model(dummy_input_rnn) # Print input and output shapes print(f"Input sequence shape: {dummy_input_rnn.shape}") print(f"Output prediction shape: {output_rnn.shape}") print(f"Final hidden state shape: {final_hidden_state.shape}") Running this should produce output similar to:Input sequence shape: torch.Size([15, 4, 10]) Output prediction shape: torch.Size([4, 5]) Final hidden state shape: torch.Size([1, 4, 20])This shows the model processes a batch of 4 sequences, each 15 steps long with 10 features per step. It outputs a final prediction vector of size 5 for each sequence in the batch, along with the final hidden state. The hidden state shape reflects (Num Layers, Batch Size, Hidden Size).Further PracticeNow that you've implemented basic versions of these architectures, try experimenting:CNN Variations:Change kernel_size, stride, or padding in the nn.Conv2d layers. Predict the output shape before running the code. How does padding='same' (when stride=1) affect the output dimensions?Add another convolutional/pooling block. Remember to recalculate the input size for the nn.Linear layer.Change the number of out_channels in the convolutional layers.RNN Variations:Increase num_layers in the SimpleRNN. Observe the shape of the initial hidden state h0 and the final hidden state hn.Change hidden_size.Replace nn.RNN with nn.LSTM or nn.GRU. Note that nn.LSTM handles a tuple of hidden states (hidden state and cell state). You'll need to adjust the initialization and handling of hidden states accordingly. The input/output shapes largely follow the same pattern.Modify the forward method to use the outputs from all time steps (out) instead of just the last one, perhaps by applying the linear layer to every step or using an aggregation method like averaging.This practice provides a concrete foundation for building CNNs and RNNs. By understanding how to define these layers, connect them in a forward method, and manage their input/output shapes, you are well-equipped to construct and adapt these powerful architectures for various deep learning tasks using PyTorch.