With your machine learning model loaded and ready (as discussed in the previous sections), the next logical step is to create the web interface for it. This interface is the API endpoint, a specific URL within your FastAPI application that external clients or services can interact with to get predictions. We'll focus on creating endpoints that receive input data, pass it to the model, and return the model's output.
Typically, predictions involve sending data to the server to be processed, making the HTTP POST
method the appropriate choice for prediction endpoints. We'll also leverage the Pydantic models defined in Chapter 2 to ensure the incoming data has the correct structure and types, and to define the format of the response.
Let's start by outlining a basic prediction endpoint. Assume we have a Pydantic model InputFeatures
representing the expected input data and a PredictionOutput
model for the response. We also assume our trained ML model object is available, perhaps loaded into a global variable model
for simplicity at this stage (more advanced loading techniques like dependency injection will be covered later in this chapter).
from fastapi import FastAPI
from pydantic import BaseModel
import joblib # Or your preferred library (pickle, etc.)
import numpy as np
# --- Pydantic Models (from Chapter 2) ---
class InputFeatures(BaseModel):
# Example features - adjust to your model's needs
feature1: float
feature2: int
feature3: float
category_feature: str # Example categorical feature
class PredictionOutput(BaseModel):
prediction: float # Or int, str, list, depending on your model
# --- Application Setup ---
app = FastAPI()
# --- Load the Model (Simplified Example) ---
# In a real application, manage loading more carefully (e.g., on startup)
try:
# Replace 'your_model.joblib' with your model file
model = joblib.load('your_model.joblib')
# If your model requires a preprocessor (e.g., for scaling, encoding)
# preprocessor = joblib.load('your_preprocessor.joblib')
except FileNotFoundError:
print("Error: Model file not found. Ensure 'your_model.joblib' exists.")
model = None # Handle the case where the model isn't loaded
# preprocessor = None
# --- Prediction Endpoint ---
@app.post("/predict", response_model=PredictionOutput)
async def make_prediction(input_data: InputFeatures):
"""
Receives input features, uses the loaded model to make a prediction,
and returns the prediction.
"""
if model is None:
raise HTTPException(status_code=503, detail="Model not loaded")
# 1. Prepare input data for the model
# Convert Pydantic model to format expected by the model (e.g., DataFrame, NumPy array)
# This often involves selecting features and potentially preprocessing
# Example: Create a NumPy array in the correct order/shape
# Note: Preprocessing (like one-hot encoding 'category_feature') should match training
# For simplicity here, we assume the model directly accepts this structure
# or that preprocessing steps are integrated within the loaded 'model' object (e.g., a Pipeline)
# Simple example assuming model expects a list or NumPy array of numerical values
# You MUST adapt this part based on your specific model's input requirements!
# Example: features = [[input_data.feature1, input_data.feature2, input_data.feature3]]
# If preprocessing is separate:
# processed_features = preprocessor.transform(features)
# prediction_result = model.predict(processed_features)
# Placeholder: Direct feature access (adapt as needed)
# This is highly dependent on your model's expected input format
try:
# Example: Assuming model expects a 2D array-like structure
model_input = np.array([[
input_data.feature1,
input_data.feature2,
input_data.feature3
# Add preprocessed categorical features if needed
]])
# 2. Perform Inference
prediction_value = model.predict(model_input)
# 3. Format the response
# Model might return a NumPy array or list; extract the relevant value
# Ensure the type matches PredictionOutput.prediction type
result = float(prediction_value[0]) # Example: Get first element and cast
return PredictionOutput(prediction=result)
except Exception as e:
# Handle potential errors during preprocessing or prediction
raise HTTPException(status_code=500, detail=f"Prediction error: {str(e)}")
Let's break down the make_prediction
function:
Function Signature:
@app.post("/predict", response_model=PredictionOutput)
: This decorator registers the function to handle POST requests at the /predict
path. response_model=PredictionOutput
tells FastAPI to validate the return value against the PredictionOutput
model and automatically document it.async def make_prediction(input_data: InputFeatures):
: Defines an asynchronous function. The input_data: InputFeatures
parameter declaration is significant. FastAPI uses this type hint to:
InputFeatures
Pydantic model.InputFeatures
object and pass it as the input_data
argument. If validation fails, FastAPI automatically returns a detailed HTTP 422 Unprocessable Entity error response.Preparing Input Data:
model_input = np.array(...)
: This is a critical step where you transform the validated input_data
(an InputFeatures
object) into the precise format your machine learning model expects for its predict()
method. This might involve:
Pipeline
object, simplifying this step considerably. Failure to match the training preprocessing is a common source of errors.Performing Inference:
prediction_value = model.predict(model_input)
: Here, the prepared input is passed to the loaded model's predict()
method (or predict_proba()
if probabilities are needed). This is where the actual machine learning computation happens. Note that if your model's prediction step is computationally intensive and blocks the CPU, it can hinder the performance of your asynchronous application. Techniques to handle this are discussed in Chapter 5. For now, we assume inference is reasonably fast.Formatting the Response:
result = float(prediction_value[0])
: Model prediction methods often return NumPy arrays or lists, even for single predictions. You need to extract the relevant value and potentially convert its type to match your PredictionOutput
model.return PredictionOutput(prediction=result)
: You create an instance of your PredictionOutput
model using the prediction result. FastAPI automatically serializes this Pydantic object into a JSON response for the client.This structure provides a robust way to serve predictions: FastAPI handles the web server mechanics and data validation, while your code focuses on the ML-specific tasks of data preparation, inference, and response formatting. Remember to adapt the data preparation and response formatting steps precisely to the requirements of your specific model and the desired output.
© 2025 ApX Machine Learning