docker run
docker-compose.yml
When your machine learning training script runs inside a container, it generates output that is essential for monitoring progress, debugging issues, and ensuring the reproducibility of your experiments. Effectively managing these logs is a fundamental part of containerized ML workflows. Without proper management, valuable information can be lost when the container stops.
stdout
and stderr
By default, applications running inside a Docker container send their standard output (stdout
) and standard error (stderr
) streams to the container's logging driver. Docker captures these streams, allowing you to inspect them later.
The most direct way to access these logs is using the docker logs
command followed by the container ID or name:
# Start a training container in the background
docker run -d --name training_job my_ml_image python train.py --epochs 10
# Fetch the logs from the container
docker logs training_job
# Follow the log output in real-time (like tail -f)
docker logs -f training_job
This method is straightforward for simple cases or quick debugging. However, relying solely on the default logging driver has limitations:
docker rm
), the logs associated with it are usually lost unless a different logging driver is configured.docker logs
becomes cumbersome with many containers or in distributed environments.A common and more robust practice is to configure your training script to write logs to specific files inside the container, rather than just printing to stdout
/stderr
. Most programming languages provide standard logging libraries for this purpose.
For instance, in Python, you can use the built-in logging
module:
# Example snippet within train.py
import logging
import sys
# Configure logging to file and console
log_file = '/app/logs/training.log'
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(log_file),
logging.StreamHandler(sys.stdout) # Also log to console
]
)
logging.info("Starting training process...")
# ... training code ...
logging.info("Training finished.")
This script now writes timestamped log messages to /app/logs/training.log
inside the container's filesystem and duplicates them to stdout
.
Writing logs to a file inside the container solves the formatting issue but not the persistence problem. If the container is removed, the /app/logs/training.log
file disappears with it. To preserve these logs, you need to store them outside the container's ephemeral filesystem using Docker volumes or bind mounts, which you encountered in the previous chapter.
Using Bind Mounts:
Bind mounts map a directory from your host machine directly into the container. This is often convenient during development.
# Create a directory on the host to store logs
mkdir -p /path/on/host/training_logs
# Run the container, mounting the host directory to the container's log directory
docker run -d \
--name training_job \
-v /path/on/host/training_logs:/app/logs \
my_ml_image \
python train.py --epochs 10
Now, the training.log
file generated inside the container at /app/logs/training.log
will actually be written to /path/on/host/training_logs/training.log
on your host machine, persisting even after the container is stopped and removed.
Using Docker Volumes:
Volumes are managed by Docker and are the preferred method for persisting data in production or when you don't want to tie the data storage to a specific host directory structure.
# Create a Docker volume (optional, Docker creates it if it doesn't exist)
docker volume create ml_training_logs
# Run the container, mounting the named volume to the container's log directory
docker run -d \
--name training_job \
-v ml_training_logs:/app/logs \
my_ml_image \
python train.py --epochs 10
Logs are now stored in the ml_training_logs
volume, managed by Docker. You can inspect the volume's contents or back it up independently of the container or host filesystem specifics.
Diagram illustrating how bind mounts and Docker volumes connect the log file inside the container to storage on the Docker host.
For easier automated processing and analysis, consider formatting your logs in a structured way, such as JSON. Many logging libraries support custom formatters.
# Example using Python's logging with a JSON formatter (requires python-json-logger)
# pip install python-json-logger
import logging
from pythonjsonlogger import jsonlogger
log_handler = logging.FileHandler('/app/logs/training.log')
formatter = jsonlogger.JsonFormatter('%(asctime)s %(levelname)s %(message)s')
log_handler.setFormatter(formatter)
logger = logging.getLogger()
logger.addHandler(log_handler)
logger.setLevel(logging.INFO)
logger.info("Training started", extra={'learning_rate': 0.01, 'epochs': 10})
# Output in training.log: {"asctime": "...", "levelname": "INFO", "message": "Training started", "learning_rate": 0.01, "epochs": 10}
Structured logs are much easier to ingest into log analysis platforms.
While persisting logs to host files via volumes or bind mounts works well for single experiments or small setups, managing logs across numerous containers or in production environments often involves centralized logging systems.
Tools like the ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Graylog, Grafana Loki, or cloud-provider services (AWS CloudWatch Logs, Google Cloud Logging, Azure Monitor Logs) are designed to aggregate logs from multiple sources (including Docker containers). They provide powerful searching, visualization, and alerting capabilities.
Configuring Docker to forward logs directly to these systems usually involves setting up a specific logging driver either globally for the Docker daemon or per container using the --log-driver
and --log-opt
flags with docker run
. While configuring these systems is beyond the scope of this intermediate course, it's important to know they exist as the standard solution for large-scale log management.
In summary, managing logs from containerized training jobs involves moving beyond the basic docker logs
command. Configuring your application to log to files and then using Docker volumes or bind mounts to persist these files outside the container provides a reliable way to capture and retain training history for debugging, analysis, and reproducibility. For more complex setups, consider adopting structured logging and exploring centralized logging solutions.
© 2025 ApX Machine Learning