Now that you understand how to serialize your machine learning models and load them within your application environment, the next question is: how do you make these loaded models available to your API endpoint functions where predictions actually happen?
You could load the model into a global variable and access it directly from your endpoint functions. While simple, this approach can make your application harder to test and maintain, especially as it grows. Global state can lead to tight coupling between different parts of your code and make it difficult to swap implementations, for instance, using a mock model during testing.
FastAPI provides a powerful and elegant solution through its Dependency Injection system. Dependency Injection is a design pattern where the components an object needs (its dependencies) are "injected" into it from an external source, rather than the object creating them itself. In FastAPI, this system allows you to declare dependencies for your path operation functions (the functions handling requests like @app.post("/predict")
), and FastAPI takes care of providing these dependencies when the endpoint is called.
At its core, FastAPI's dependency injection uses the Depends
function. You declare a dependency in your path operation function's parameters using Depends
, passing it a callable (like another function). When a request comes in for that endpoint, FastAPI will:
Depends
).This mechanism allows you to separate the logic for providing a resource (like an ML model) from the logic of using that resource within your endpoint.
Depends
for Model AccessLet's see how this applies to accessing our machine learning model. Assume you have already loaded your model into memory when the application starts, perhaps storing it in a variable accessible within your application module (let's call it loaded_model
).
First, define a simple dependency function whose only job is to return the loaded model:
# Assume 'model_loader.py' handles loading the model at startup
from .model_loader import loaded_model
# Dependency function
def get_model():
"""Returns the pre-loaded machine learning model."""
return loaded_model
Now, in your API endpoint function where you need to perform predictions, you can declare that it depends on the result of get_model
:
from fastapi import FastAPI, Depends, HTTPException
from pydantic import BaseModel
# Assume 'get_model' is defined as above
from .dependencies import get_model
# Assume 'ml_model_type' is the type hint for your loaded model
from typing import Any # Replace Any with your actual model type if possible
app = FastAPI()
class PredictionInput(BaseModel):
feature1: float
feature2: int
# ... other features
class PredictionOutput(BaseModel):
prediction: float # Or appropriate type
# Inject the model using Depends
@app.post("/predict/", response_model=PredictionOutput)
async def make_prediction(
input_data: PredictionInput,
model: Any = Depends(get_model) # Inject the model here
):
"""
Receives input data, uses the injected model to make a prediction,
and returns the result.
"""
try:
# Convert Pydantic model to format expected by the model
# This step depends heavily on your specific model
model_input = [[input_data.feature1, input_data.feature2]] # Example conversion
# Use the injected model to make a prediction
prediction_result = model.predict(model_input)
# Assuming the prediction result is a single value or easily extractable
return PredictionOutput(prediction=float(prediction_result[0]))
except Exception as e:
# Handle potential errors during prediction
raise HTTPException(status_code=500, detail=f"Prediction error: {e}")
In this example, before make_prediction
is executed for an incoming request, FastAPI calls get_model()
. The returned value (our loaded_model
) is then passed as the model
argument to make_prediction
. The endpoint function itself doesn't need to know how or where the model came from; it just receives it as a parameter.
make_prediction
) is decoupled from the model loading logic (get_model
). The endpoint focuses solely on validation, prediction, and response formatting.make_prediction
endpoint, you can easily provide a different dependency function (a "mock" or "fake") that returns a predictable dummy model instead of the real one. This allows you to test the endpoint's logic in isolation without needing the actual (potentially slow or complex) model. FastAPI's testing utilities (TestClient
) integrate smoothly with overriding dependencies.get_model
dependency function can be reused by multiple endpoints if they all need access to the same model instance.get_model
function. None of the endpoint functions that depend on it need to change, as long as the dependency function continues to return a compatible model object.make_prediction
function requires a model
object to perform its task.While loading a model into a simple global variable might seem sufficient for very small applications, using FastAPI's dependency injection system provides a much cleaner, more scalable, and significantly more testable architecture for integrating your machine learning models into production-ready APIs. It's a standard practice in FastAPI development for managing resources like database connections, external service clients, and, importantly for us, machine learning models.
© 2025 ApX Machine Learning