Let's consolidate the concepts we've covered by building and training a model for a common sequence modeling task: sentiment analysis. We'll use either an LSTM or a GRU layer to classify text reviews as positive or negative. This exercise demonstrates how to apply the framework APIs, handle sequence data, and construct a complete model.
We assume you have a basic understanding of text preprocessing steps like tokenization and padding, which are covered in detail in Chapter 8. Here, we'll focus on integrating these steps with LSTM/GRU model implementation.
We'll use the popular IMDB dataset, which contains 50,000 movie reviews labeled as either positive (1) or negative (0). This dataset is often included directly within deep learning frameworks, making it convenient to access.
# Example using TensorFlow/Keras
import tensorflow as tf
from tensorflow import keras
# Load the dataset, keeping only the top N most frequent words
VOCAB_SIZE = 10000
(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=VOCAB_SIZE)
print(f"Training entries: {len(train_data)}, labels: {len(train_labels)}")
print(f"Sample review (integer encoded): {train_data[0][:20]}...")
The data is already integer-encoded, where each integer represents a specific word in the dataset's vocabulary.
Recurrent networks require inputs of uniform length. Since movie reviews vary in length, we need to pad or truncate them to a fixed size. We'll use post-padding, meaning we add zeros at the end of shorter sequences. Masking (often handled automatically by framework layers) will ensure these padded values are ignored during computation.
# Pad sequences to a maximum length
MAX_SEQUENCE_LENGTH = 256
train_data_padded = keras.preprocessing.sequence.pad_sequences(
train_data,
value=0, # Pad value
padding='post', # Pad at the end
maxlen=MAX_SEQUENCE_LENGTH
)
test_data_padded = keras.preprocessing.sequence.pad_sequences(
test_data,
value=0,
padding='post',
maxlen=MAX_SEQUENCE_LENGTH
)
print(f"Sample padded review length: {len(train_data_padded[0])}")
print(f"Sample padded review: {train_data_padded[0][:30]}...")
Now, let's define our model architecture using the Keras Sequential API.
(batch_size, sequence_length)
and outputs (batch_size, sequence_length, embedding_dim)
.EMBEDDING_DIM = 16
RNN_UNITS = 32 # Number of units in the LSTM/GRU layer
model = keras.Sequential([
keras.layers.Embedding(input_dim=VOCAB_SIZE,
output_dim=EMBEDDING_DIM,
mask_zero=True, # Important: Enables masking for padded values
input_length=MAX_SEQUENCE_LENGTH),
keras.layers.LSTM(RNN_UNITS), # You could replace LSTM with GRU here
keras.layers.Dense(1, activation='sigmoid') # Output layer for binary classification
])
model.summary()
The mask_zero=True
argument in the Embedding
layer is significant. It tells downstream layers (like the LSTM) to ignore time steps where the input was 0 (our padding value).
Basic model structure: Input -> Embedding -> LSTM -> Dense Output.
Before training, we need to configure the learning process using compile
. We specify the optimizer, the loss function, and metrics to monitor.
adam
is a generally good starting choice.binary_crossentropy
is appropriate for binary (0/1) classification problems with a sigmoid output.accuracy
allows us to track the percentage of correctly classified reviews during training and evaluation.model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
We can now train the model using the fit
method, providing the padded training data and labels. We also set aside a portion of the training data for validation during training to monitor performance on unseen data and check for overfitting.
EPOCHS = 10
BATCH_SIZE = 512
# Create a validation set from the training data
validation_split = 0.2
num_validation_samples = int(validation_split * len(train_data_padded))
x_val = train_data_padded[:num_validation_samples]
partial_x_train = train_data_padded[num_validation_samples:]
y_val = train_labels[:num_validation_samples]
partial_y_train = train_labels[num_validation_samples:]
print("Training the model...")
history = model.fit(partial_x_train,
partial_y_train,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
validation_data=(x_val, y_val),
verbose=1) # Set verbose=1 or 2 to see progress per epoch
print("Training complete.")
After training, we evaluate the model's performance on the held-out test set. We can also visualize the training and validation accuracy and loss over epochs to understand the learning dynamics.
print("\nEvaluating on test data...")
results = model.evaluate(test_data_padded, test_labels, verbose=0)
print(f"Test Loss: {results[0]:.4f}")
print(f"Test Accuracy: {results[1]:.4f}")
# Plotting training history (requires Plotly)
import plotly.graph_objects as go
from plotly.subplots import make_subplots
history_dict = history.history
acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']
epochs_range = range(1, EPOCHS + 1)
fig = make_subplots(rows=1, cols=2, subplot_titles=("Training and Validation Loss", "Training and Validation Accuracy"))
fig.add_trace(go.Scatter(x=list(epochs_range), y=loss, name='Training Loss', mode='lines+markers', line=dict(color='#4263eb')), row=1, col=1)
fig.add_trace(go.Scatter(x=list(epochs_range), y=val_loss, name='Validation Loss', mode='lines+markers', line=dict(color='#f76707')), row=1, col=1)
fig.add_trace(go.Scatter(x=list(epochs_range), y=acc, name='Training Accuracy', mode='lines+markers', line=dict(color='#12b886')), row=1, col=2)
fig.add_trace(go.Scatter(x=list(epochs_range), y=val_acc, name='Validation Accuracy', mode='lines+markers', line=dict(color='#ae3ec9')), row=1, col=2)
fig.update_layout(height=400, width=800, title_text="Model Training History")
fig.update_xaxes(title_text="Epochs", row=1, col=1)
fig.update_xaxes(title_text="Epochs", row=1, col=2)
fig.update_yaxes(title_text="Loss", row=1, col=1)
fig.update_yaxes(title_text="Accuracy", row=1, col=2)
# The following line generates the Plotly JSON output
print(fig.to_json())
Example training history showing loss and accuracy curves for training and validation sets over epochs. (Note: Actual curve values are illustrative and depend on the specific training run).
The plot helps identify potential overfitting (where training accuracy keeps improving, but validation accuracy plateaus or decreases) and determine if more training epochs are needed.
keras.layers.LSTM(RNN_UNITS)
with keras.layers.GRU(RNN_UNITS)
and retrain. Compare the performance and training time.model_stacked = keras.Sequential([
keras.layers.Embedding(VOCAB_SIZE, EMBEDDING_DIM, mask_zero=True, input_length=MAX_SEQUENCE_LENGTH),
keras.layers.LSTM(RNN_UNITS, return_sequences=True), # Returns hidden state for each time step
keras.layers.LSTM(RNN_UNITS), # This layer receives the sequence
keras.layers.Dense(1, activation='sigmoid')
])
keras.layers.Bidirectional
to process the input sequence in both forward and backward directions, potentially capturing context more effectively.
model_bidirectional = keras.Sequential([
keras.layers.Embedding(VOCAB_SIZE, EMBEDDING_DIM, mask_zero=True, input_length=MAX_SEQUENCE_LENGTH),
keras.layers.Bidirectional(keras.layers.LSTM(RNN_UNITS)), # Wrap the LSTM layer
keras.layers.Dense(1, activation='sigmoid')
])
Note that a Bidirectional
layer typically doubles the output feature dimension (one set of features for forward, one for backward), unless configured otherwise.EMBEDDING_DIM
, RNN_UNITS
, optimizer
choice, learning rate, and BATCH_SIZE
. Add Dropout
for regularization (covered later).This practical example provides a concrete foundation for implementing LSTM and GRU models for sequence classification. You can adapt this structure for various other sequence-based tasks by modifying the input data preparation and the final output layer(s) of the model.
© 2025 ApX Machine Learning