Machine learning models are often trained on data represented in specific formats, such as NumPy arrays, Pandas DataFrames, or tensors with precise shapes and data types. However, clients interacting with your API typically send data in more web-friendly formats, most commonly JSON payloads or occasionally as uploaded files. Your FastAPI application acts as the bridge, receiving data in one format and transforming it into the structure your loaded model expects.
Building upon the data validation structures defined using Pydantic (covered in Chapter 2), this section focuses on the practical steps within your endpoint functions to handle these incoming formats and perform the necessary preprocessing before invoking your model's prediction method.
The most frequent scenario involves receiving input features as a JSON object or an array of objects. Pydantic models excel at validating the structure and types of this incoming JSON.
For predicting on a single instance, the client usually sends a JSON object where keys represent feature names and values represent the corresponding feature values.
# schemas.py
from pydantic import BaseModel
class IrisFeatures(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
# Example of a categorical feature if needed:
# species_guess: str | None = None
In your endpoint, you'll receive an instance of this Pydantic model. Your task is then to convert this object into the numerical format your model requires, often a 2D NumPy array where each row is a sample (even if it's just one sample).
# main.py
from fastapi import FastAPI, Depends
import numpy as np
# Assume schemas.py contains IrisFeatures
from .schemas import IrisFeatures
# Assume model_loader.py provides a loaded model
from .model_loader import get_model
app = FastAPI()
# Placeholder for a loaded model object (e.g., scikit-learn)
# In a real app, use dependency injection as shown later
# model = load_my_sklearn_model('path/to/model.joblib')
@app.post("/predict/single")
async def predict_single(
features: IrisFeatures,
model = Depends(get_model) # Use dependency injection
):
"""Receives single feature set via JSON, returns prediction."""
# Convert Pydantic model to list or dictionary if needed
feature_values = [
features.sepal_length,
features.sepal_width,
features.petal_length,
features.petal_width,
]
# Convert to NumPy array, ensuring the shape is (1, num_features)
# Models often expect a 2D array, even for a single sample.
input_array = np.array(feature_values).reshape(1, -1)
# Perform prediction
prediction = model.predict(input_array)
probability = model.predict_proba(input_array) # If applicable
# Convert NumPy types to standard Python types for JSON response
return {
"prediction": prediction[0].item(),
"probability": probability[0].tolist() # Convert array to list
}
Notice the conversion to a NumPy array and the reshape(1, -1)
call. This explicitly creates a 2D array with one row, which is the standard input format for many libraries like scikit-learn, even when predicting for a single instance. Also, note the conversion of NumPy results (prediction[0]
, probability[0]
) back to standard Python types (item()
, tolist()
) before returning the JSON response.
For efficiency, you might want to allow clients to send multiple instances for prediction in a single API call. This is typically done by sending a JSON array of objects. Pydantic handles this using List[YourModel]
.
# schemas.py
from pydantic import BaseModel
from typing import List
class IrisFeatures(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
class BatchPredictionRequest(BaseModel):
instances: List[IrisFeatures]
class PredictionResult(BaseModel):
prediction: int # Or float, depending on the model output
probability: List[float] | None = None
class BatchPredictionResponse(BaseModel):
predictions: List[PredictionResult]
The endpoint then iterates through the list, prepares each instance, collects them into a batch (usually a 2D NumPy array), and feeds the entire batch to the model if it supports batch inference (most do).
# main.py
# ... other imports
from typing import List
from .schemas import BatchPredictionRequest, BatchPredictionResponse, PredictionResult
@app.post("/predict/batch", response_model=BatchPredictionResponse)
async def predict_batch(
request: BatchPredictionRequest,
model = Depends(get_model)
):
"""Receives batch of features via JSON array, returns batch predictions."""
batch_features = []
for features in request.instances:
feature_values = [
features.sepal_length,
features.sepal_width,
features.petal_length,
features.petal_width,
]
batch_features.append(feature_values)
# Convert the list of lists into a 2D NumPy array
input_array = np.array(batch_features)
# Perform batch prediction
predictions = model.predict(input_array)
probabilities = model.predict_proba(input_array) # If applicable
# Format the results
results = []
for i in range(len(predictions)):
results.append(
PredictionResult(
prediction=predictions[i].item(),
probability=probabilities[i].tolist()
)
)
return BatchPredictionResponse(predictions=results)
This batch processing approach is generally more efficient than making multiple individual API calls, as it reduces network overhead and can leverage optimized batch inference capabilities of the underlying ML library.
For models that operate on non-tabular data like images, audio, or documents, clients might need to upload files directly instead of embedding data within JSON. FastAPI handles this using File
and UploadFile
.
# main.py
from fastapi import FastAPI, File, UploadFile, Depends
from PIL import Image # Pillow library for image processing
import io
import numpy as np
# Assume model_loader provides a loaded image classification model
from .model_loader import get_image_model
app = FastAPI()
def preprocess_image(image_bytes: bytes) -> np.ndarray:
"""Loads image bytes, preprocesses for the model."""
# Example preprocessing: Open, resize, convert to NumPy array, normalize
try:
img = Image.open(io.BytesIO(image_bytes))
# Common preprocessing steps (adjust based on your model)
img = img.resize((224, 224)) # Example resize
img_array = np.array(img)
if img_array.ndim == 2: # Handle grayscale
img_array = np.stack((img_array,)*3, axis=-1) # Convert to 3 channels
img_array = img_array / 255.0 # Normalize to [0, 1]
# Add batch dimension (1, height, width, channels)
img_array = np.expand_dims(img_array, axis=0)
return img_array.astype(np.float32) # Ensure correct dtype
except Exception as e:
# Handle errors like invalid image format
print(f"Error preprocessing image: {e}")
raise ValueError("Invalid image file or format") from e
@app.post("/predict/image")
async def predict_image(
image_file: UploadFile = File(...),
model = Depends(get_image_model) # Dependency injection for image model
):
"""Receives an image file, returns prediction."""
contents = await image_file.read()
try:
input_tensor = preprocess_image(contents)
except ValueError as e:
return {"error": str(e)}
finally:
await image_file.close() # Important to close the file
# Perform prediction (model expects preprocessed input)
prediction = model.predict(input_tensor) # Or model.forward(input_tensor) etc.
# Process prediction output (e.g., get class label)
# This depends heavily on your model's output format
predicted_class_index = np.argmax(prediction[0])
# Assume class_names is a list available to the endpoint
# predicted_label = class_names[predicted_class_index]
return {
"filename": image_file.filename,
"content_type": image_file.content_type,
"prediction_index": predicted_class_index.item(),
# "predicted_label": predicted_label
}
In this example:
image_file
using UploadFile = File(...)
.await image_file.read()
to get the file contents as bytes (as file I/O is inherently asynchronous).preprocess_image
encapsulates the steps needed to convert the raw bytes into the specific tensor format the model requires (resizing, normalization, adding batch dimension, ensuring correct data type). This function uses the Pillow library (PIL
) for image manipulation.await image_file.close()
in a finally
block to ensure resources are released.An alternative, sometimes seen for smaller images, is to encode the image data as a Base64 string within a JSON payload. This avoids multipart form data but increases the payload size. The API endpoint would then decode the Base64 string back into bytes before proceeding with preprocessing.
# schemas.py
from pydantic import BaseModel
class ImageBase64(BaseModel):
image_b64: str
filename: str | None = None
# main.py (endpoint snippet)
import base64
@app.post("/predict/image_base64")
async def predict_image_base64(
request: ImageBase64,
model = Depends(get_image_model)
):
"""Receives Base64 encoded image via JSON, returns prediction."""
try:
image_bytes = base64.b64decode(request.image_b64)
input_tensor = preprocess_image(image_bytes) # Use the same preprocessor
except (ValueError, base64.binascii.Error) as e:
return {"error": f"Invalid Base64 data or image format: {e}"}
# ... (prediction and response formatting as before) ...
prediction = model.predict(input_tensor)
predicted_class_index = np.argmax(prediction[0])
return {
"filename": request.filename or "unknown",
"prediction_index": predicted_class_index.item(),
}
Choose the input method (direct file upload vs. Base64 in JSON) based on client requirements, expected file sizes, and API design preferences. File uploads are generally better for larger binary data.
For Natural Language Processing (NLP) models, the input is typically text, sent as strings within a JSON payload.
# schemas.py
from pydantic import BaseModel
from typing import List
class TextItem(BaseModel):
text: str
class TextBatchRequest(BaseModel):
texts: List[str]
# main.py (endpoint snippet)
# Assume model_loader provides a loaded NLP model/pipeline
from .model_loader import get_nlp_model
@app.post("/predict/text")
async def predict_text(
request: TextItem, # Or TextBatchRequest for batching
model = Depends(get_nlp_model)
):
"""Receives text input, returns NLP model prediction (e.g., sentiment)."""
input_text = request.text # Or request.texts for batch
# Preprocessing might involve tokenization, vectorization, etc.
# *IF* this is not part of the loaded model pipeline:
# processed_input = preprocess_text(input_text)
# prediction = model.predict(processed_input)
# *IF* the model (e.g., scikit-learn Pipeline) handles preprocessing:
prediction = model.predict([input_text]) # Pass raw text(s)
# Format the prediction result
# Example for sentiment analysis:
sentiment_score = prediction[0].item()
sentiment_label = "positive" if sentiment_score > 0.5 else "negative"
return {
"input_text": input_text,
"sentiment_score": sentiment_score,
"sentiment_label": sentiment_label
}
A significant consideration for text (and sometimes other data types) is where preprocessing like tokenization or vectorization occurs. If you saved a scikit-learn Pipeline
that includes components like TfidfVectorizer
, your loaded model artifact already contains the necessary preprocessing steps. In this case, you can often pass the raw text directly to the pipeline's predict
method. If preprocessing is not part of the saved model, you must implement those exact steps within your FastAPI endpoint (or helper functions) before calling predict
. Consistency between training preprocessing and inference preprocessing is absolutely mandatory for correct results.
The following diagram illustrates the typical data flow when handling different input formats:
Data flow from client request through FastAPI processing to model inference and response.
Regardless of the input format (JSON, file upload), the pattern remains similar: receive the request, validate or read the data, preprocess it into the exact format the underlying ML model library expects, perform the prediction, and format the results into a JSON response. Carefully managing this transformation is essential for building functional and reliable ML prediction APIs.
© 2025 ApX Machine Learning