All Courses

Working with Pre-trained Models in Julia

Building sophisticated deep learning models from scratch requires substantial data, computational resources, and time. Fortunately, you don't always need to start from zero. Pre-trained models, which are networks already trained on large, standard datasets (like ImageNet for images or large text corpora for language), offer a fantastic starting point. This section will guide you through using these models within the Julia ecosystem, focusing on how they can accelerate your projects and improve performance, especially when your own dataset is limited.

The Advantage of Pre-trained Models

Using a model that has already learned to recognize general features from a dataset can provide a significant head start. For instance, a model trained on ImageNet has learned to identify edges, textures, patterns, and even complex object parts. These learned features are often useful for a wide range of computer vision tasks. The main benefits include:

Reduced Training Time: Since the model has already learned a rich set of features, you'll need less time and data to adapt it to your specific problem.
Improved Performance: Pre-trained models often serve as excellent feature extractors, leading to better performance on your target task, particularly with smaller datasets where training a deep network from scratch might lead to overfitting.
Access to Complex Architectures: You get to use well-tested, state-of-the-art architectures without needing to implement them yourself.

Accessing Pre-trained Models in Julia with Metalhead.jl

For computer vision tasks in Julia, Metalhead.jl is the go-to package. It provides a collection of popular architectures like ResNet, VGG, DenseNet, MobileNet, and others, with weights pre-trained on the ImageNet dataset.

To start using Metalhead.jl, you first need to add it to your Julia environment:

using Pkg
Pkg.add("Metalhead")

Once installed, you can load a pre-trained model easily. For example, to load a ResNet-18 model:

using Flux, Metalhead

# Load a ResNet-18 model with pre-trained weights
model = ResNet(18; pretrain=true)

Setting pretrain=true ensures that the model is initialized with weights learned from ImageNet. If pretrain=false (the default for some models if not specified), the model architecture is loaded with random weights.

Using Pre-trained Models for Inference

The most straightforward use of a pre-trained model is for inference: making predictions on new data using the knowledge it already possesses. For instance, an ImageNet-trained model can classify images into one of the 1000 ImageNet categories.

Here's a simplified workflow for image classification:

Load the Model: As shown above.
Prepare Input Data: Pre-trained models expect input data in a specific format. This usually involves:
- Resizing the image to the expected input dimensions (e.g., 224x224 pixels).
- Normalizing pixel values based on the statistics used during the original training (often specific means and standard deviations for ImageNet). Metalhead.jl models typically expect input as a 4D array (WHCN format: Width, Height, Channels, Batch size) with pixel values scaled to [0, 1]. For specific normalization details, it's always good to check the Metalhead.jl documentation or the source of the model weights.
- Adding a batch dimension.
Make Predictions: Pass the prepared input through the model.
Interpret Output: The output will typically be a vector of scores (logits or probabilities) for each class.

using Flux, Metalhead, Images, CUDA # Assuming Images.jl for image loading

# 1. Load the model
model = ResNet(18; pretrain=true)

# Check for GPU availability and move model if possible
if CUDA.functional()
    model = model |> gpu
    @info "Model moved to GPU"
else
    @info "GPU not available, using CPU"
end

# 2. Prepare Input Data (Simplified - actual preprocessing can be more involved)
# Replace with your image loading and preprocessing logic
function preprocess_image(img_path)
    img = Images.load(img_path)
    # Resize to 224x224, convert to Float32, permute dims to WHCN, normalize
    # Metalhead models expect CHWN by default if using its utils, 
    # but Flux layers expect WHCN. Let's assume input needs to be WHCN.
    # A typical preprocessing sequence for Metalhead models:
    img_resized = Images.imresize(img, (224, 224))
    img_float = Float32.(channelview(img_resized)) # Becomes CHW
    img_permuted = permutedims(img_float, (2, 3, 1)) # HWC -> WHC (or directly to CHW and then permute to WHCN)
                                                     # It's WHCN for Flux Conv layers.
                                                     # Metalhead's own utilities might handle this.
                                                     # Let's target WHCN for direct Flux usage.
    # Assuming img_float is CHW from channelview:
    img_whc = permutedims(img_float, (3, 2, 1)) # CHW -> WHC

    # Add batch dimension
    img_batch_cpu = Flux.unsqueeze(img_whc, 4) # WHCN
    return img_batch_cpu
end

img_data_cpu = preprocess_image("path/to/your/image.jpg") # Replace with actual path

# Move data to GPU if model is on GPU
img_data = CUDA.functional() ? (img_data_cpu |> gpu) : img_data_cpu

# 3. Make Predictions
output = model(img_data)

# 4. Interpret Output
probabilities = softmax(vec(cpu(output))) # Move output to CPU for post-processing
# Metalhead.jl provides imagenet_labels() for class names
# class_labels = Metalhead.imagenet_labels()
# top_class_idx = argmax(probabilities)
# println("Predicted class: $(class_labels[top_class_idx]) with probability $(probabilities[top_class_idx])")

Remember that input preprocessing is critical. Mismatches in input size, normalization, or channel order will lead to poor or meaningless results. Always consult the documentation for the specific model or library you are using.

Transfer Learning: Adapting Models to Your Needs

While direct inference is useful, the real strength of pre-trained models often comes from transfer learning. This involves adapting a pre-trained model to a new task that is different from, but related to, the task it was originally trained on. There are two main strategies for transfer learning:

1. Feature Extraction

In this approach, you treat the pre-trained model (or a part of it, typically the convolutional base) as a fixed feature extractor. The idea is that the early layers of a CNN learn general features (edges, textures), while later layers learn more task-specific features. For a new task, these general features can still be very informative.

Workflow:

Load a pre-trained model: e.g., ResNet(18; pretrain=true).
Isolate the feature extraction layers: Most Metalhead.jl models, like ResNet, are structured as a Chain where the first element contains the convolutional base (feature extractor) and the second element is the classifier. You can select just the feature extractor part.
```
base_model = ResNet(18; pretrain=true)
feature_extractor = base_model.layers[1] # This is the convolutional base for ResNet
```
Freeze the weights: Since we want to use it as a fixed extractor, prevent its weights from being updated during training on your new task.
```
Flux.freeze!(feature_extractor)
```

Add a new classifier: Append new, randomly initialized layers (e.g., Dense layers) on top of the frozen feature extractor. This new "head" will be trained from scratch on your custom dataset.

# Example: ResNet-18's feature extractor outputs 512 features
# Let's say our new task has 10 classes
num_new_classes = 10
new_classifier_head = Chain(
    AdaptiveMeanPool((1,1)), # Pools features from the conv base
    Flux.flatten,            # Flattens the output for Dense layer
    Dense(512, num_new_classes) # 512 is output of ResNet-18 base
)

# Combine into a new model
transfer_model = Chain(feature_extractor, new_classifier_head)

Train the new model: Train transfer_model on your dataset. Only the weights of new_classifier_head will be updated.

Feature extraction is particularly effective when your dataset is small and very different from the original dataset the model was trained on. Since you're only training a small, new classifier, you can often get good results with limited data and avoid overfitting.

2. Fine-tuning

Fine-tuning takes transfer learning a step further. Instead of keeping the pre-trained model's weights entirely frozen, you allow some of them, typically in the later layers, to be updated during training on your new dataset, usually with a very small learning rate.

Workflow:

Load a pre-trained model and replace the head: Start similarly to feature extraction by loading a pre-trained model and replacing its final classification layer(s) with new ones appropriate for your task.

model_to_finetune = ResNet(18; pretrain=true)
feature_extractor_base = model_to_finetune.layers[1]

num_new_classes = 10 # Your number of classes
# Assuming ResNet-18 output before original fc is 512 features
new_head = Chain(
    AdaptiveMeanPool((1,1)), 
    Flux.flatten,
    Dense(512, num_new_classes)
)
finetune_model = Chain(feature_extractor_base, new_head)

Unfreeze some layers (optional but common): Decide which layers of the pre-trained base you want to fine-tune. It's common to unfreeze the top few layers or blocks of the convolutional base, as these learn more specialized features.

# Example: Unfreeze the last block of layers in ResNet-18's feature_extractor_base
# The exact structure depends on the model; inspect `feature_extractor_base.layers`
# For ResNet, layers are grouped into stages. Let's say we unfreeze the last stage.
# For ResNet, model.layers[1] is a Chain of stages.
# Flux.unfreeze!(finetune_model.layers[1][end]) # Unfreeze the last stage
# Or unfreeze all params in the base for full fine-tuning
Flux.unfreeze!(finetune_model.layers[1])

Train with a small learning rate: Fine-tuning usually involves two stages:
- First, train only the newly added classification head with the base model frozen (as in feature extraction). This initializes the head.
- Then, unfreeze the desired layers of the base model and continue training the entire network with a very small learning rate (e.g., $10^{-4}$ or $10^{-5}$ ). This prevents the pre-trained weights from being distorted too much or too quickly.

Fine-tuning is often beneficial when your dataset is larger and somewhat similar to the original dataset. It allows the model to adapt its learned features more closely to your specific task.

GPU Acceleration in Transfer Learning: Remember to move your transfer_model or finetune_model and your data to the GPU using model |> gpu and data |> gpu if CUDA is available. This is especially important for larger pre-trained models and fine-tuning, which can be computationally intensive.

if CUDA.functional()
    finetune_model = finetune_model |> gpu
    # Ensure your data loaders also provide data on the GPU
end

# Example optimizer for fine-tuning
# opt = Adam(1e-4) # Small learning rate

# Proceed with your training loop (as covered in Chapter 4)
# train!(loss, Flux.params(finetune_model), train_dataloader, opt; cb = ...)

In Other Domains: Pre-trained Models

While Metalhead.jl is prominent for vision, the principles of using pre-trained models apply to other domains as well. For Natural Language Processing (NLP), libraries like Transformers.jl are emerging in the Julia ecosystem, providing access to pre-trained language models like BERT or GPT. The general workflow of loading models, preparing data, and adapting them via feature extraction or fine-tuning remains similar, though the specifics of model architectures and data preprocessing will differ.

Saving and Loading Your Adapted Models

Once you've adapted a pre-trained model and trained it on your task, you'll want to save it. As discussed in Chapter 3, BSON.jl is a common choice for serializing Julia objects, including Flux models:

using BSON

# Assuming 'my_custom_model' is your trained transfer learning model
BSON.@save "my_custom_model.bson" my_custom_model

# To load it back:
BSON.@load "my_custom_model.bson" loaded_model # loaded_model will be the name used during save
# loaded_model will then contain your model architecture and trained weights.

Working with pre-trained models is a powerful technique that can significantly boost your productivity and model performance in deep learning. By understanding how to access, use, and adapt these models in Julia, you can build upon the collective knowledge encoded in these large networks to tackle your own specific challenges more effectively. As you progress, you'll find these skills invaluable for a wide range of applications.

Was this section helpful?