Convolutional Autoencoders (ConvAEs) are well-suited for image data, using convolutional and pooling layers to respect spatial hierarchies. Build a ConvAE to extract features from images. This exercise uses the popular MNIST dataset, which consists of grayscale images of handwritten digits. This practical application will solidify understanding of ConvAE architecture and its application in feature learning.Our goal is to train a ConvAE to reconstruct MNIST images and then use its encoder part to transform these images into a lower-dimensional feature representation.Setting Up the EnvironmentFirst, ensure you have PyTorch and Torchvision installed. If you've been following along with the course, your environment should be ready. We'll also use NumPy for numerical operations and Matplotlib or Plotly for visualizations. For the embedded visualizations here, we'll prepare Plotly JSON.import numpy as np import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torchvision.datasets import MNIST from torchvision.transforms import ToTensor, Resize # For visualizations if you run this in a notebook: # import matplotlib.pyplot as plt # For t-SNE: # from sklearn.manifold import TSNE1. Loading and Preprocessing the MNIST DatasetMNIST images are 28x28 pixels. For convolutional layers in PyTorch, we need the format (Channels, Height, Width). We'll also normalize the pixel values to the range [0, 1], which is good practice for training neural networks. The torchvision library makes this easy.# Load MNIST dataset and apply transformations transform = ToTensor() # Converts images to PyTorch tensors and normalizes to [0, 1] train_dataset = MNIST(root='./data', train=True, download=True, transform=transform) test_dataset = MNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False) # Get a sample to check the shape sample_data, _ = next(iter(train_loader)) print(f"Sample batch shape: {sample_data.shape}") You should see output like: Sample batch shape: torch.Size([128, 1, 28, 28])2. Building the Convolutional EncoderThe encoder's job is to compress the input image into a compact latent representation. It typically consists of a series of nn.Conv2d layers (to learn features) followed by nn.MaxPool2d layers (to downsample and reduce dimensionality).Let's define an encoder that maps the 1x28x28 input image to a latent vector of 64 dimensions.latent_dim = 64 # Dimensionality of the latent space class Encoder(nn.Module): def __init__(self): super(Encoder, self).__init__() self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1) # -> 16x28x28 self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # -> 16x14x14 self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1) # -> 32x14x14 self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) # -> 32x7x7 self.flatten = nn.Flatten() # The flattened size is 32 * 7 * 7 = 1568 self.fc = nn.Linear(32 * 7 * 7, latent_dim) self.relu = nn.ReLU() def forward(self, x): x = self.relu(self.conv1(x)) x = self.pool1(x) x = self.relu(self.conv2(x)) x = self.pool2(x) x = self.flatten(x) x = self.relu(self.fc(x)) return x # Instantiate and print the encoder encoder = Encoder() print(encoder)Printing the encoder will show the architecture and layers. Notice how the spatial dimensions decrease while the number of filters (features) can increase, capturing more complex patterns before being condensed into the latent_dim vector. The padding=1 with a kernel_size=3 ensures that the output feature map has the same spatial dimensions as the input (before pooling), making architecture design a bit more straightforward.3. Building the Convolutional DecoderThe decoder's task is the opposite of the encoder: to reconstruct the original image from the latent representation. It often mirrors the encoder's architecture but uses nn.ConvTranspose2d layers to increase the spatial dimensions.class Decoder(nn.Module): def __init__(self): super(Decoder, self).__init__() # Dense layer to upscale from latent dim to the pre-flattened size self.fc = nn.Linear(latent_dim, 32 * 7 * 7) # Reshape will be done in the forward pass using .view() self.convT1 = nn.ConvTranspose2d(32, 16, kernel_size=2, stride=2) # -> 16x14x14 self.convT2 = nn.ConvTranspose2d(16, 1, kernel_size=2, stride=2) # -> 1x28x28 self.relu = nn.ReLU() self.sigmoid = nn.Sigmoid() def forward(self, x): x = self.relu(self.fc(x)) x = x.view(-1, 32, 7, 7) # Reshape to 32x7x7 x = self.relu(self.convT1(x)) x = self.sigmoid(self.convT2(x)) # Sigmoid for [0,1] pixel values return x # Instantiate and print the decoder decoder = Decoder() print(decoder)The nn.ConvTranspose2d layers with stride=2 effectively double the spatial dimensions at each step. The final layer uses a sigmoid activation because our input images were normalized to be between 0 and 1.4. Assembling the Autoencoder and Defining Loss/OptimizerNow, we combine the encoder and decoder into a full autoencoder model. In PyTorch, this is another nn.Module that calls the encoder and decoder in sequence. We also define our loss function and optimizer.class Autoencoder(nn.Module): def __init__(self, encoder, decoder): super(Autoencoder, self).__init__() self.encoder = encoder self.decoder = decoder def forward(self, x): encoded = self.encoder(x) decoded = self.decoder(encoded) return decoded autoencoder = Autoencoder(encoder, decoder) print(autoencoder) # Define the loss function and optimizer criterion = nn.BCELoss() # Binary Cross-Entropy Loss for pixel-wise comparison optimizer = optim.Adam(autoencoder.parameters(), lr=1e-3) We use BCELoss as the loss function, which is suitable for comparing pixel values that are between 0 and 1 (due to the sigmoid activation in the decoder's last layer). The Adam optimizer is a common and effective choice.A diagram can help visualize this architecture:digraph G { rankdir=TB; node [shape=box, style="filled", fillcolor="#e9ecef", fontname="sans-serif"]; edge [fontname="sans-serif"]; subgraph cluster_encoder { label = "Encoder"; style="dashed"; fillcolor="#f8f9fa"; InputImg [label="Input Image\n(1x28x28)", fillcolor="#a5d8ff"]; Conv1 [label="nn.Conv2d (16 filters, 3x3, ReLU)\n+ nn.MaxPool2d (2x2)", fillcolor="#74c0fc"]; Conv2 [label="nn.Conv2d (32 filters, 3x3, ReLU)\n+ nn.MaxPool2d (2x2)", fillcolor="#4dabf7"]; FlattenLayer [label="nn.Flatten\n(32x7x7 -> 1568)", fillcolor="#339af0"]; LatentVec [label="nn.Linear (Bottleneck)\nLatent Vector (64 dim, ReLU)", fillcolor="#228be6"]; InputImg -> Conv1 -> Conv2 -> FlattenLayer -> LatentVec; } subgraph cluster_decoder { label = "Decoder"; style="dashed"; fillcolor="#f8f9fa"; DenseDecode [label="nn.Linear (1568, ReLU)\n+ Reshape (32x7x7)", fillcolor="#91a7ff"]; ConvT1 [label="nn.ConvTranspose2d (16 filters, 2x2, stride=2, ReLU)", fillcolor="#748ffc"]; ConvT2 [label="nn.ConvTranspose2d (1 filter, 2x2, stride=2, Sigmoid)\nReconstructed Image (1x28x28)", fillcolor="#5c7cfa"]; DenseDecode -> ConvT1 -> ConvT2; } LatentVec -> DenseDecode [label="Latent Representation"]; }The Convolutional Autoencoder architecture. The encoder maps the input image to a low-dimensional latent vector, and the decoder attempts to reconstruct the original image from this vector.5. Training the AutoencoderWith the model, loss, and optimizer defined, we can write our training loop. The autoencoder learns to reconstruct its input, so the input images serve as both the input and the target.import torch import matplotlib.pyplot as plt from torchvision.transforms import Resize import numpy as np # Assuming autoencoder, test_loader, and device are already defined from your previous setup # Predict on test images autoencoder.eval() # Set model to evaluation mode reconstructed_imgs = [] original_imgs = [] # Define a transformation to resize images to 8x8 pixels resize_transform = Resize((16, 16)) with torch.no_grad(): for i, data in enumerate(test_loader): imgs, _ = data imgs = imgs.to(device) outputs = autoencoder(imgs) # Store the first 5 images from the first batch if i == 0: # Resize original and reconstructed images to 8x8 original_imgs = resize_transform(imgs).cpu().numpy() reconstructed_imgs = resize_transform(outputs).cpu().numpy() break # Prepare for displaying with Matplotlib n_display = 5 fig, axes = plt.subplots(2, n_display, figsize=(n_display * 2, 4)) # Adjust figsize as needed for i in range(n_display): # Original image axes[0, i].imshow(original_imgs[i, 0], cmap='Greys') axes[0, i].axis('off') # Turn off axis if i == 0: axes[0, i].set_title("Original Images", fontsize=12) # Reconstructed image axes[1, i].imshow(reconstructed_imgs[i, 0], cmap='Greys') axes[1, i].axis('off') # Turn off axis if i == 0: axes[1, i].set_title("Reconstructed Images", fontsize=12) plt.tight_layout(rect=[0, 0, 1, 0.95]) # Adjust layout to make space for titles plt.suptitle("Original vs. Reconstructed Images", fontsize=16, y=1.0) # Overall title plt.show(){ "data": [ { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x1", "yaxis": "y1" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x2", "yaxis": "y2" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x3", "yaxis": "y3" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x4", "yaxis": "y4" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x5", "yaxis": "y5" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x6", "yaxis": "y6" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x7", "yaxis": "y7" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x8", "yaxis": "y8" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x9", "yaxis": "y9" }, { "type": "heatmap", "z": [ [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ], "colorscale": "Greys", "showscale": false, "xaxis": "x10", "yaxis": "y10" } ], "layout": { "height": 350, "width": 700, "grid": { "rows": 2, "columns": 5, "pattern": "independent" }, "annotations": [ { "text": "Original Images", "showarrow": false, "xref": "paper", "yref": "paper", "x": 0.5, "y": 1.05, "yanchor": "bottom", "font": { "size": 16 } }, { "text": "Reconstructed Images", "showarrow": false, "xref": "paper", "yref": "paper", "x": 0.5, "y": 0.48, "yanchor": "bottom", "font": { "size": 16 } } ], "margin": { "l": 20, "r": 20, "t": 70, "b": 20 }, "xaxis1": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.18] }, "yaxis1": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.55, 0.95] }, "xaxis2": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.2, 0.38] }, "yaxis2": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.55, 0.95] }, "xaxis3": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.4, 0.58] }, "yaxis3": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.55, 0.95] }, "xaxis4": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.6, 0.78] }, "yaxis4": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.55, 0.95] }, "xaxis5": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.8, 0.98] }, "yaxis5": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.55, 0.95] }, "xaxis6": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.18] }, "yaxis6": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.45] }, "xaxis7": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.2, 0.38] }, "yaxis7": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.45] }, "xaxis8": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.4, 0.58] }, "yaxis8": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.45] }, "xaxis9": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.6, 0.78] }, "yaxis9": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.45] }, "xaxis10": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.8, 0.98] }, "yaxis10": { "showticklabels": false, "showgrid": false, "zeroline": false, "domain": [0.0, 0.45] } } }Comparison of original MNIST test images (top row) and their reconstructions by the ConvAE (bottom row). The reconstructions should be recognizable, though perhaps a bit blurrier than the originals.The quality of reconstructions depends on the model architecture, latent dimension size, and training duration. More complex models or longer training might yield sharper images.7. Extracting FeaturesThe primary goal of this exercise is feature extraction. The encoder part of our trained autoencoder can now be used to transform input images into their latent_dim-dimensional feature vectors.# Use the trained encoder to get latent representations (features) encoder.eval() # Set encoder to evaluation mode all_features = [] all_labels = [] # For the full dataset, iterate through the loaders full_train_loader = DataLoader(train_dataset, batch_size=1024) full_test_loader = DataLoader(test_dataset, batch_size=1024) with torch.no_grad(): for data in full_train_loader: # Using a larger batch size for inference imgs, labels = data imgs = imgs.to(device) features = encoder(imgs) all_features.append(features.cpu().numpy()) all_labels.append(labels.numpy()) encoded_features_train = np.concatenate(all_features, axis=0) y_train = np.concatenate(all_labels, axis=0) # Repeat for the test set all_features = [] all_labels = [] with torch.no_grad(): for data in full_test_loader: imgs, labels = data imgs = imgs.to(device) features = encoder(imgs) all_features.append(features.cpu().numpy()) all_labels.append(labels.numpy()) encoded_features_test = np.concatenate(all_features, axis=0) y_test = np.concatenate(all_labels, axis=0) print(f"Shape of training features: {encoded_features_train.shape}") print(f"Shape of test features: {encoded_features_test.shape}")This will output: Shape of training features: (60000, 64) Shape of test features: (10000, 64)Each image is now represented by a vector of 64 numbers. These features are learned by the autoencoder to capture the essential information needed to reconstruct the original image. They are often more semantically meaningful than the raw pixel values and can be used for downstream tasks like classification or clustering.8. Visualizing the Latent Space (Optional)To get a sense of how the autoencoder has organized the data in its latent space, we can use a dimensionality reduction technique like t-SNE to project the 64-dimensional features down to 2 dimensions and then plot them, coloring by the original digit labels.# # The following code uses scikit-learn for t-SNE and would be run in a Python environment. # # It might be computationally intensive for the full dataset, so a subset is often used for visualization. # from sklearn.manifold import TSNE # import plotly.express as px # # Use a subset of test features for t-SNE (e.g., first 5000 samples) # num_samples_tsne = 5000 # tsne = TSNE(n_components=2, random_state=42, perplexity=30, n_iter=300) # latent_2d = tsne.fit_transform(encoded_features_test[:num_samples_tsne]) # # Create a Plotly scatter plot # fig = px.scatter(x=latent_2d[:, 0], y=latent_2d[:, 1], color=y_test[:num_samples_tsne].astype(str), # # labels={'color': 'Digit'}, title="t-SNE visualization of MNIST latent space (ConvAE features)") # # fig.show() # In a notebook # For static display, here's an example Plotly JSON structure for a t-SNE plot. # This would be populated with actual t-SNE results. tsne_plot_data = [] # Placeholder: Manually create a few sample points for 10 classes as Plotly JSON # This is illustrative; real t-SNE would generate these points. sample_points = { 0: [[-5, -5], [-5.5, -4.5]], 1: [[5, 5], [5.5, 4.5]], 2: [[-5, 5], [-4.5, 5.5]], 3: [[5, -5], [4.5, -5.5]], 4: [[0, 0], [0.5, 0.5]], 5: [[-2, -2], [-1.5, -2.5]], 6: [[2, 2], [1.5, 2.5]], 7: [[-2, 2], [-2.5, 1.5]], 8: [[2, -2], [2.5, -1.5]], 9: [[0, 3], [0, 3.5]] } colors_map_plotly = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'] for digit_class in range(10): points = np.array(sample_points.get(digit_class, [])) if points.shape[0] > 0: tsne_plot_data.append({ "type": "scatter", "mode": "markers", "x": points[:,0].tolist(), "y": points[:,1].tolist(), "name": f"Digit {digit_class}", "marker": {"color": colors_map_plotly[digit_class], "size": 8} }){"data": [{"type": "scatter", "mode": "markers", "x": [-5, -5.5], "y": [-5, -4.5], "name": "Digit 0", "marker": {"color": "#1f77b4", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [5, 5.5], "y": [5, 4.5], "name": "Digit 1", "marker": {"color": "#ff7f0e", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [-5, -4.5], "y": [5, 5.5], "name": "Digit 2", "marker": {"color": "#2ca02c", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [5, 4.5], "y": [-5, -5.5], "name": "Digit 3", "marker": {"color": "#d62728", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [0, 0.5], "y": [0, 0.5], "name": "Digit 4", "marker": {"color": "#9467bd", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [-2, -1.5], "y": [-2, -2.5], "name": "Digit 5", "marker": {"color": "#8c564b", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [2, 1.5], "y": [2, 2.5], "name": "Digit 6", "marker": {"color": "#e377c2", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [-2, -2.5], "y": [2, 1.5], "name": "Digit 7", "marker": {"color": "#7f7f7f", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [2, 2.5], "y": [-2, -1.5], "name": "Digit 8", "marker": {"color": "#bcbd22", "size": 8}}, {"type": "scatter", "mode": "markers", "x": [0, 0], "y": [3, 3.5], "name": "Digit 9", "marker": {"color": "#17becf", "size": 8}}], "layout": {"title": "Illustrative t-SNE of MNIST Latent Space (ConvAE Features)", "xaxis": {"title": "t-SNE Component 1"}, "yaxis": {"title": "t-SNE Component 2"}, "width": 600, "height": 500, "legend": {"title": {"text":"Digit"}}}}An illustrative t-SNE visualization of the learned latent features from the ConvAE. Ideally, points corresponding to the same digit would cluster together, and different digits would form distinct (or somewhat separated) clusters.If the autoencoder has learned well, you should see some separation between clusters of different digits. This indicates that the latent features capture discriminative information about the digit classes, even though the autoencoder was trained purely on reconstruction without any label information.Summary of this Hands-on SessionIn this session, you've successfully:Loaded and preprocessed the MNIST image dataset using PyTorch's torchvision.Designed and built a convolutional encoder in PyTorch to map images to a latent space.Designed and built a convolutional decoder in PyTorch to reconstruct images from their latent representations.Combined these into a Convolutional Autoencoder and trained it using a PyTorch training loop.Visualized the quality of image reconstructions.Used the trained encoder to extract feature vectors from images.Optionally, visualized the structure of these features in the latent space.These extracted features, encoded_features_train and encoded_features_test, are now ready to be used in various downstream machine learning tasks, such as classification, which we will explore further in Chapter 7. This exercise demonstrates the power of ConvAEs for learning compact and useful representations from image data.