Once you've dedicated effort to constructing and training your neural network architectures, whether they are MLPs, CNNs, or RNNs, you'll naturally want to preserve your work. Model serialization is the process of saving your model's learned parameters, and often its structure, to a file. This allows you to load it back later for inference, to continue training, or to share it with others. Without serialization, your trained model would exist only in your computer's memory and be lost when your Julia session ends.
Saving a trained model is a standard practice in machine learning workflows for several important reasons:
Essentially, serialization makes your models tangible assets that can be managed, versioned, and utilized over time.
For Flux models, the recommended package for serialization is BSON.jl
. BSON stands for Binary JSON, and it's a binary-encoded serialization format for JSON-like documents. It's designed to be lightweight, traversable, and efficient. BSON.jl
is well-suited for Julia objects, including the complex structures and functions that can make up a Flux model.
To use BSON.jl
, you'll first need to add it to your Julia environment and import it:
# If not already installed:
# import Pkg; Pkg.add("BSON")
using Flux
using BSON
The primary macro for saving objects with BSON.jl
is BSON.@save
. Let's say you have a trained Flux model named my_cnn_model
.
To save your model to a file, you use BSON.@save
followed by the filename and the variable containing your model:
# Assume 'my_cnn_model' is a trained Flux model
# For example:
# my_cnn_model = Chain(
# Conv((3, 3), 1=>16, relu),
# MaxPool((2,2)),
# Conv((3, 3), 16=>32, relu),
# MaxPool((2,2)),
# Flux.flatten,
# Dense(32*5*5, 10), # Assuming 28x28 input, adjust size accordingly
# softmax
# )
# ... training happens here ...
# Save the model
BSON.@save "my_trained_cnn.bson" my_cnn_model
println("Model saved to my_trained_cnn.bson")
This command saves the my_cnn_model
object, including its architecture and learned weights, into a file named my_trained_cnn.bson
.
Often, you'll want to save more than just the model. For instance, you might want to save the state of your optimizer, the current epoch number, or performance metrics. BSON.@save
allows you to save multiple variables into the same BSON file:
# Assume 'model', 'optimizer_state', 'epoch', and 'history' are defined
# model = Chain(...)
# opt = Adam()
# optimizer_state = Flux.setup(opt, model) # Get optimizer state
# epoch = 100
# history = Dict("loss" => [0.5, 0.4, ...], "accuracy" => [0.8, 0.85, ...])
# BSON.@save "training_checkpoint.bson" model optimizer_state epoch history
# Or, using keyword arguments for clarity when loading:
BSON.@save "training_checkpoint.bson" trained_model=model opt_state=optimizer_state current_epoch=epoch training_history=history
println("Checkpoint saved to training_checkpoint.bson")
Saving with keyword arguments (e.g., trained_model=model
) is good practice as it makes loading more explicit, as you'll see next.
To load a saved model (or other objects) from a BSON file, you use the BSON.@load
macro.
The BSON.@load
macro loads the objects from the specified file into your current Julia workspace. The variables will be named as they were when saved, or by the keywords used during saving.
using Flux, BSON
# Load the model saved previously
BSON.@load "my_trained_cnn.bson" my_cnn_model # Assumes it was saved as 'my_cnn_model'
# If saved with a keyword: BSON.@load "my_trained_cnn.bson" loaded_model_alias
# Then 'loaded_model_alias' would contain the model.
# 'my_cnn_model' is now available in your workspace
# You can inspect it or use it for predictions
# For example:
# dummy_input = rand(Float32, 28, 28, 1, 1) # Example input for a CNN
# predictions = my_cnn_model(dummy_input)
# println("Model loaded and predictions made.")
# If you saved multiple objects using keywords:
BSON.@load "training_checkpoint.bson" trained_model opt_state current_epoch training_history
# Now 'trained_model', 'opt_state', 'current_epoch', and 'training_history' are available.
println("Model and associated data loaded successfully.")
The following diagram illustrates the basic save and load cycle:
Model serialization workflow: saving a Flux model to a BSON file and subsequently loading it back into a Julia environment.
When you load a model, Julia needs to understand the structure of the objects being loaded.
Dense
, Conv
, Chain
), BSON.jl
and Flux handle this smoothly.BSON.@load
.Sometimes, you might prefer to save only the learned parameters (weights and biases) of the model, rather than the entire model object. This approach can offer more flexibility, especially if you want to load weights into a slightly modified architecture or if you're concerned about compatibility issues with the model structure itself across different Flux versions.
Flux provides Flux.params(model)
to extract all trainable parameters and Flux.loadparams!(model, params_array)
to load them back.
using Flux, BSON
# Assume 'model' is your trained Flux model
# model = Chain(Dense(10, 5, relu), Dense(5, 1))
# ... training ...
# Extract parameters
model_parameters = Flux.params(model)
# Note: Flux.params(model) returns a Zygote.Params object.
# To save them effectively with BSON, it's often better to collect them into a standard array of arrays.
# However, BSON can often handle Zygote.Params directly. For more saving, especially
# if you want to inspect or manipulate weights outside Flux, converting to regular arrays is safer.
# For direct BSON saving and loading into another Flux model, saving Zygote.Params often works.
# To save the raw weight arrays:
# weights_arrays = [copy(p) for p in model_parameters] # copy() to get them off GPU if needed and make them plain arrays
# BSON.@save "model_just_weights.bson" trained_weights=weights_arrays
# For simplicity, if BSON handles Zygote.Params well in your setup:
BSON.@save "model_params_object.bson" model_ps=model_parameters
println("Model parameters saved.")
To load these parameters, you first need to construct an instance of your model with the same architecture. Then, you load the saved parameters and populate your new model instance.
using Flux, BSON
# 1. Reconstruct the model architecture
# This MUST match the architecture whose parameters were saved.
new_model_instance = Chain(Dense(10, 5, relu), Dense(5, 1))
# 2. Load the saved parameters
BSON.@load "model_params_object.bson" model_ps # Loads 'model_ps'
# 3. Load parameters into the model
Flux.loadparams!(new_model_instance, model_ps)
println("Parameters loaded into new model instance.")
# 'new_model_instance' is now ready with the trained weights.
This method decouples the model's structure (which you define in code) from its learned state (the parameters), offering a cleaner separation.
If your model was moved to a GPU using model |> gpu
during training, its parameters will likely be CuArray
s (from CUDA.jl
). BSON.jl
can save these CuArray
s.
When loading:
CUDA.jl
, the model might load directly as a GPU model.using Flux, BSON, CUDA # Assuming CUDA is available
# Assume "gpu_trained_model.bson" contains a model trained on GPU
BSON.@load "gpu_trained_model.bson" loaded_model
# Check if model is on GPU (its parameters would be CuArrays)
# You might need to inspect e.g., first(Flux.params(loaded_model)) isa CuArray
# To ensure it's on CPU:
cpu_model = loaded_model |> cpu
println("Model moved to CPU.")
# To ensure it's on GPU (if a GPU is available):
# if CUDA.functional()
# gpu_model = loaded_model |> gpu
# println("Model moved to GPU.")
# else
# println("No functional GPU available, model remains on CPU.")
# end
It's good practice to explicitly manage the device transfer (|> cpu
or |> gpu
) after loading, especially if you plan to deploy the model in an environment different from where it was trained.
.bson
files) as important artifacts. If you're using Git, consider Git LFS (Large File Storage) for model files, as they can become large. Naming conventions that include version numbers or timestamps can also be helpful (e.g., my_model_v1.2_epoch50.bson
).# BSON.@save "model_with_metadata.bson" model=my_model version="1.1" dataset="CIFAR10" epoch=50 acc=0.85
While BSON.jl
is the common choice for Flux models, another general-purpose serialization package in Julia is JLD2.jl
. JLD2 saves data in a format based on HDF5. For some Julia objects, JLD2.jl
can be very effective. However, BSON.jl
generally has better support for the types of closures and custom structs frequently encountered within Flux models, making it a more straightforward option for this specific use case. If you encounter issues with BSON.jl
for very specific custom types, JLD2.jl
might be worth investigating as an alternative, though it might require more care in how models are saved and loaded.
By mastering model serialization, you ensure that the neural network architectures you construct and train are preserved, shareable, and ready for future use, whether that's making predictions on new data or serving as a foundation for further experimentation. This capability will be particularly useful as you work through practical exercises, such as the upcoming one where you'll build a CNN for image classification; saving your trained classifier will be a natural final step.
Was this section helpful?
© 2025 ApX Machine Learning