All Courses

Model Serialization: Saving and Loading Flux Models

Once you've dedicated effort to constructing and training your neural network architectures, whether they are MLPs, CNNs, or RNNs, you'll naturally want to preserve your work. Model serialization is the process of saving your model's learned parameters, and often its structure, to a file. This allows you to load it back later for inference, to continue training, or to share it with others. Without serialization, your trained model would exist only in your computer's memory and be lost when your Julia session ends.

The Importance of Saving and Loading Models

Saving a trained model is a standard practice in machine learning workflows for several important reasons:

Persistence: Training deep learning models can take hours, days, or even weeks. Saving your model ensures that this computational effort is not lost due to system restarts or other interruptions.
Deployment: To use your model to make predictions in a production environment or another application, you need a way to load the trained version.
Reproducibility: Saving a model along with its version and training details helps in reproducing results later or by others.
Transfer Learning: A saved model can serve as a starting point for training on a new task, potentially saving significant training time.
Checkpointing: For very long training runs, saving the model periodically (checkpointing) allows you to resume training from the last saved state if something goes wrong, rather than starting from scratch.

Essentially, serialization makes your models tangible assets that can be managed, versioned, and utilized over time.

Serialization in Julia with BSON.jl

For Flux models, the recommended package for serialization is BSON.jl. BSON stands for Binary JSON, and it's a binary-encoded serialization format for JSON-like documents. It's designed to be lightweight, traversable, and efficient. BSON.jl is well-suited for Julia objects, including the complex structures and functions that can make up a Flux model.

To use BSON.jl, you'll first need to add it to your Julia environment and import it:

# If not already installed:
# import Pkg; Pkg.add("BSON")

using Flux
using BSON

Saving Flux Models

The primary macro for saving objects with BSON.jl is BSON.@save. Let's say you have a trained Flux model named my_cnn_model.

Saving a Single Model

To save your model to a file, you use BSON.@save followed by the filename and the variable containing your model:

# Assume 'my_cnn_model' is a trained Flux model
# For example:
# my_cnn_model = Chain(
#     Conv((3, 3), 1=>16, relu),
#     MaxPool((2,2)),
#     Conv((3, 3), 16=>32, relu),
#     MaxPool((2,2)),
#     Flux.flatten,
#     Dense(32*5*5, 10), # Assuming 28x28 input, adjust size accordingly
#     softmax
# )
# ... training happens here ...

# Save the model
BSON.@save "my_trained_cnn.bson" my_cnn_model
println("Model saved to my_trained_cnn.bson")

This command saves the my_cnn_model object, including its architecture and learned weights, into a file named my_trained_cnn.bson.

Saving Multiple Objects

Often, you'll want to save more than just the model. For instance, you might want to save the state of your optimizer, the current epoch number, or performance metrics. BSON.@save allows you to save multiple variables into the same BSON file:

# Assume 'model', 'optimizer_state', 'epoch', and 'history' are defined
# model = Chain(...)
# opt = Adam()
# optimizer_state = Flux.setup(opt, model) # Get optimizer state
# epoch = 100
# history = Dict("loss" => [0.5, 0.4, ...], "accuracy" => [0.8, 0.85, ...])

# BSON.@save "training_checkpoint.bson" model optimizer_state epoch history
# Or, using keyword arguments for clarity when loading:
BSON.@save "training_checkpoint.bson" trained_model=model opt_state=optimizer_state current_epoch=epoch training_history=history
println("Checkpoint saved to training_checkpoint.bson")

Saving with keyword arguments (e.g., trained_model=model) is good practice as it makes loading more explicit, as you'll see next.

Loading Flux Models

To load a saved model (or other objects) from a BSON file, you use the BSON.@load macro.

Restoring a Model

The BSON.@load macro loads the objects from the specified file into your current Julia workspace. The variables will be named as they were when saved, or by the keywords used during saving.

using Flux, BSON

# Load the model saved previously
BSON.@load "my_trained_cnn.bson" my_cnn_model # Assumes it was saved as 'my_cnn_model'
# If saved with a keyword: BSON.@load "my_trained_cnn.bson" loaded_model_alias
# Then 'loaded_model_alias' would contain the model.

# 'my_cnn_model' is now available in your workspace
# You can inspect it or use it for predictions
# For example:
# dummy_input = rand(Float32, 28, 28, 1, 1) # Example input for a CNN
# predictions = my_cnn_model(dummy_input)
# println("Model loaded and predictions made.")

# If you saved multiple objects using keywords:
BSON.@load "training_checkpoint.bson" trained_model opt_state current_epoch training_history
# Now 'trained_model', 'opt_state', 'current_epoch', and 'training_history' are available.

println("Model and associated data loaded successfully.")

The following diagram illustrates the basic save and load cycle:

Model serialization workflow: saving a Flux model to a BSON file and subsequently loading it back into a Julia environment.

Environment Considerations for Loading

When you load a model, Julia needs to understand the structure of the objects being loaded.

Standard Flux Layers: If your model is composed of standard Flux layers (like Dense, Conv, Chain), BSON.jl and Flux handle this smoothly.
Custom Layers/Structs: If your model includes custom-defined layers or structs, the definitions for these types must be available in the Julia environment where you are loading the model. This means the Julia code defining your custom struct needs to be executed before you call BSON.@load.
Package Versions: Significant discrepancies in package versions (e.g., Flux.jl itself, or CUDA.jl if GPU is involved) between the saving and loading environments can sometimes lead to issues. It's generally best to load models in an environment with similar package versions.

Advanced: Saving and Loading Model Parameters Only

Sometimes, you might prefer to save only the learned parameters (weights and biases) of the model, rather than the entire model object. This approach can offer more flexibility, especially if you want to load weights into a slightly modified architecture or if you're concerned about compatibility issues with the model structure itself across different Flux versions.

Flux provides Flux.params(model) to extract all trainable parameters and Flux.loadparams!(model, params_array) to load them back.

Saving Parameters

using Flux, BSON

# Assume 'model' is your trained Flux model
# model = Chain(Dense(10, 5, relu), Dense(5, 1))
# ... training ...

# Extract parameters
model_parameters = Flux.params(model)

# Note: Flux.params(model) returns a Zygote.Params object.
# To save them effectively with BSON, it's often better to collect them into a standard array of arrays.
# However, BSON can often handle Zygote.Params directly. For more saving, especially
# if you want to inspect or manipulate weights outside Flux, converting to regular arrays is safer.
# For direct BSON saving and loading into another Flux model, saving Zygote.Params often works.

# To save the raw weight arrays:
# weights_arrays = [copy(p) for p in model_parameters] # copy() to get them off GPU if needed and make them plain arrays
# BSON.@save "model_just_weights.bson" trained_weights=weights_arrays

# For simplicity, if BSON handles Zygote.Params well in your setup:
BSON.@save "model_params_object.bson" model_ps=model_parameters
println("Model parameters saved.")

Loading Parameters into a New Model Instance

To load these parameters, you first need to construct an instance of your model with the same architecture. Then, you load the saved parameters and populate your new model instance.

using Flux, BSON

# 1. Reconstruct the model architecture
# This MUST match the architecture whose parameters were saved.
new_model_instance = Chain(Dense(10, 5, relu), Dense(5, 1))

# 2. Load the saved parameters
BSON.@load "model_params_object.bson" model_ps # Loads 'model_ps'

# 3. Load parameters into the model
Flux.loadparams!(new_model_instance, model_ps)
println("Parameters loaded into new model instance.")

# 'new_model_instance' is now ready with the trained weights.

This method decouples the model's structure (which you define in code) from its learned state (the parameters), offering a cleaner separation.

Working with GPU-Trained Models

If your model was moved to a GPU using model |> gpu during training, its parameters will likely be CuArrays (from CUDA.jl). BSON.jl can save these CuArrays.

When loading:

If you load the model in an environment with a compatible GPU and CUDA.jl, the model might load directly as a GPU model.
If you load it in a CPU-only environment or want to explicitly use it on the CPU, you can move it after loading:

using Flux, BSON, CUDA # Assuming CUDA is available

# Assume "gpu_trained_model.bson" contains a model trained on GPU
BSON.@load "gpu_trained_model.bson" loaded_model

# Check if model is on GPU (its parameters would be CuArrays)
# You might need to inspect e.g., first(Flux.params(loaded_model)) isa CuArray

# To ensure it's on CPU:
cpu_model = loaded_model |> cpu
println("Model moved to CPU.")

# To ensure it's on GPU (if a GPU is available):
# if CUDA.functional()
#   gpu_model = loaded_model |> gpu
#   println("Model moved to GPU.")
# else
#   println("No functional GPU available, model remains on CPU.")
# end

It's good practice to explicitly manage the device transfer (|> cpu or |> gpu) after loading, especially if you plan to deploy the model in an environment different from where it was trained.

Practical Tips for Model Management

Versioning: Treat your saved model files (.bson files) as important artifacts. If you're using Git, consider Git LFS (Large File Storage) for model files, as they can become large. Naming conventions that include version numbers or timestamps can also be helpful (e.g., my_model_v1.2_epoch50.bson).
Include Metadata: When saving models, especially checkpoints, save relevant metadata. This could include the version of your training script, dataset identifiers, hyperparameters used, achieved performance metrics, and the epoch number. This information is invaluable for tracking experiments and reproducing results.
```
# BSON.@save "model_with_metadata.bson" model=my_model version="1.1" dataset="CIFAR10" epoch=50 acc=0.85
```
Test Loading: After saving a model, it's wise to immediately try loading it back in a clean session or script to verify that the serialization was successful and the model behaves as expected.

A Note on Alternatives: JLD2.jl

While BSON.jl is the common choice for Flux models, another general-purpose serialization package in Julia is JLD2.jl. JLD2 saves data in a format based on HDF5. For some Julia objects, JLD2.jl can be very effective. However, BSON.jl generally has better support for the types of closures and custom structs frequently encountered within Flux models, making it a more straightforward option for this specific use case. If you encounter issues with BSON.jl for very specific custom types, JLD2.jl might be worth investigating as an alternative, though it might require more care in how models are saved and loaded.

By mastering model serialization, you ensure that the neural network architectures you construct and train are preserved, shareable, and ready for future use, whether that's making predictions on new data or serving as a foundation for further experimentation. This capability will be particularly useful as you work through practical exercises, such as the upcoming one where you'll build a CNN for image classification; saving your trained classifier will be a natural final step.

Was this section helpful?