Build a simple, fully-connected autoencoder using TensorFlow/Keras and train it on the MNIST dataset of handwritten digits. This practical implementation of the classic autoencoder architecture demonstrates how the encoder, bottleneck, and decoder work together to learn a compressed representation of the data.Our goal is to train a network that takes a 784-dimensional vector (a flattened 28x28 MNIST image) as input $x$, encodes it into a much lower-dimensional latent representation $z$, and then decodes $z$ back into a 784-dimensional vector $\hat{x}$ that closely resembles the original input $x$.Setup and Data PreparationFirst, we need to import the necessary libraries and load the MNIST dataset. We'll normalize the pixel values to the range [0, 1] which is a standard practice for image data, helping with model training stability. We will also flatten the 28x28 images into vectors of size 784.import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.datasets import mnist # Load the MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() # We only need the images, not the labels # Normalize pixel values to [0, 1] and flatten images x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255. x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:]))) x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:]))) print(f"Training data shape: {x_train.shape}") print(f"Test data shape: {x_test.shape}")Training data shape: (60000, 784) Test data shape: (10000, 784)Defining the Autoencoder ModelWe'll construct our autoencoder using Keras' Functional API or Sequential API. For this simple example, the Sequential API is sufficient.Encoder: A sequence of Dense layers that progressively reduce the dimensionality. We'll use ReLU activation functions.Bottleneck: The final layer of the encoder, representing our compressed latent space $z$. Its size determines the degree of compression. Let's choose a latent dimension of 32.Decoder: A sequence of Dense layers that mirror the encoder, progressively increasing the dimensionality back to the original 784. The final layer uses a sigmoid activation function because our input pixels are normalized between 0 and 1. Sigmoid outputs values in this range, making it suitable for reconstructing the input.# Define input shape and latent dimension input_dim = 784 latent_dim = 32 # --- Encoder --- encoder = keras.Sequential( [ keras.Input(shape=(input_dim,)), layers.Dense(128, activation="relu"), layers.Dense(64, activation="relu"), layers.Dense(latent_dim, activation="relu", name="bottleneck"), # Bottleneck layer ], name="encoder", ) # --- Decoder --- decoder = keras.Sequential( [ keras.Input(shape=(latent_dim,)), layers.Dense(64, activation="relu"), layers.Dense(128, activation="relu"), layers.Dense(input_dim, activation="sigmoid"), # Output layer ], name="decoder", ) # --- Autoencoder (Encoder + Decoder) --- autoencoder = keras.Sequential( [ encoder, decoder, ], name="autoencoder", ) # Display model summaries encoder.summary() decoder.summary() autoencoder.summary()Here's a structural representation of our simple autoencoder:digraph G { rankdir=LR; node [shape=box, style=filled, color="#dee2e6", fillcolor="#e9ecef"]; edge [color="#495057"]; Input [label="Input (784)"]; Dense1 [label="Dense(128, relu)"]; Dense2 [label="Dense(64, relu)"]; Bottleneck [label="Bottleneck\nDense(32, relu)", fillcolor="#a5d8ff", color="#1c7ed6"]; Dense3 [label="Dense(64, relu)"]; Dense4 [label="Dense(128, relu)"]; Output [label="Output\nDense(784, sigmoid)", fillcolor="#ffec99", color="#f59f00"]; subgraph cluster_encoder { label = "Encoder"; style=dashed; color="#adb5bd"; Input -> Dense1 -> Dense2 -> Bottleneck; } subgraph cluster_decoder { label = "Decoder"; style=dashed; color="#adb5bd"; Bottleneck -> Dense3 -> Dense4 -> Output; } }A simple fully-connected autoencoder architecture. The encoder maps the 784-dimensional input to a 32-dimensional bottleneck, and the decoder reconstructs the 784-dimensional output.Compiling and Training the ModelBefore training, we need to compile the autoencoder model. We specify the optimizer and the loss function.Optimizer: Adam is a common and effective choice.Loss Function: Since our input pixel values are normalized between 0 and 1, and the final layer uses a sigmoid activation, Binary Cross-Entropy (BCE) is a suitable choice. However, Mean Squared Error (MSE) is also frequently used and works well, measuring the average squared difference between input pixels $x_i$ and reconstructed pixels $\hat{x}i$. Let's use MSE here for simplicity, as discussed in the chapter introduction: $$L{MSE} = \frac{1}{N}\sum_{i=1}^{N}(x_i - \hat{x}_i)^2$$We train the model to minimize this reconstruction loss, using the input data x_train as both the input and the target output.# Compile the autoencoder autoencoder.compile(optimizer='adam', loss='mse') # Using Mean Squared Error loss # Train the autoencoder epochs = 20 batch_size = 256 history = autoencoder.fit(x_train, x_train, # Input and target are the same epochs=epochs, batch_size=batch_size, shuffle=True, validation_data=(x_test, x_test)) # Evaluate reconstruction on test setDuring training, Keras will output the loss on the training set and the validation set for each epoch. We expect to see the loss decrease over time, indicating that the model is learning to reconstruct the input images more accurately.We can visualize the training progress by plotting the loss curves:{"layout": {"title": "Autoencoder Training & Validation Loss (MSE)", "xaxis": {"title": "Epoch"}, "yaxis": {"title": "Mean Squared Error Loss", "type": "log"}, "template": "plotly_white", "legend": {"traceorder": "normal"}}, "data": [{"type": "scatter", "mode": "lines", "name": "Training Loss", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], "y": [0.065, 0.041, 0.032, 0.027, 0.024, 0.021, 0.019, 0.018, 0.017, 0.016, 0.015, 0.0145, 0.014, 0.0136, 0.0133, 0.013, 0.0128, 0.0126, 0.0124, 0.0122], "line": {"color": "#228be6"}}, {"type": "scatter", "mode": "lines", "name": "Validation Loss", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], "y": [0.042, 0.033, 0.028, 0.025, 0.022, 0.020, 0.0185, 0.0175, 0.0168, 0.016, 0.0155, 0.015, 0.0145, 0.014, 0.0137, 0.0134, 0.0132, 0.013, 0.0128, 0.0126], "line": {"color": "#fd7e14"}}]}Training and validation loss (MSE, logarithmic scale) over 20 epochs for the simple autoencoder on MNIST. Both losses decrease steadily, indicating successful learning.Evaluating the Reconstruction QualityAfter training, we can evaluate the autoencoder's performance by visually comparing original test images with their reconstructions. We use the trained autoencoder model to predict the reconstructions for the test set x_test.# Predict reconstructions for the test set reconstructed_imgs = autoencoder.predict(x_test) # --- Visualization --- n = 10 # Number of digits to display plt.figure(figsize=(20, 4)) for i in range(n): # Display original images ax = plt.subplot(2, n, i + 1) plt.imshow(x_test[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Original", loc='left', fontsize=12, pad=10) # Display reconstructed images ax = plt.subplot(2, n, i + 1 + n) plt.imshow(reconstructed_imgs[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Reconstructed", loc='left', fontsize=12, pad=10) plt.suptitle("Original vs. Reconstructed MNIST Digits", fontsize=16) plt.show()You should observe that the reconstructed digits are recognizable but slightly blurry compared to the originals. This loss of detail is expected because the information had to be compressed through the 32-dimensional bottleneck layer. The autoencoder learned to retain the most salient features necessary for reconstruction while discarding some finer details or noise.SummaryIn this practical section, we successfully implemented and trained a basic fully-connected autoencoder on the MNIST dataset. We covered:Preparing the data (normalization, flattening).Defining the encoder, bottleneck, and decoder using Keras.Compiling the model with an appropriate loss function (MSE) and optimizer (Adam).Training the model to minimize reconstruction error.Visualizing the training progress and the quality of reconstructions.This example demonstrates the core functionality of an autoencoder: learning a compressed representation (encoding) and reconstructing the input from that representation (decoding). While effective for simple data, this basic architecture has limitations, particularly concerning overfitting and the structure of the learned latent space. In the next chapter, we will look at regularized autoencoders designed to address these issues and learn more robust representations.