While Pydantic's type hints provide a strong foundation for data validation, automatically checking if input data matches expected types like int
, float
, or str
, real-world applications often require more granular control. You might need to ensure a numerical input falls within a specific range, a string meets a certain length requirement, or a list contains a precise number of elements. This is where Pydantic's data conversion capabilities and constraint definitions become invaluable.
One of Pydantic's helpful features is its attempt to automatically convert input data to the types you've specified in your models. For instance, if your API expects an integer but receives the string "123"
, Pydantic will often successfully convert the string to the integer 123
.
Consider this simple model:
from pydantic import BaseModel
class Item(BaseModel):
item_id: int
name: str
price: float
is_offer: bool | None = None # Allows boolean or None
If your FastAPI endpoint receives a JSON payload like:
{
"item_id": "42",
"name": "Example Item",
"price": "99.95",
"is_offer": "true"
}
Pydantic will parse this and convert the values:
"42"
becomes the integer 42
."99.95"
becomes the float 99.95
."true"
becomes the boolean True
.This automatic conversion simplifies handling common data formats found in web requests. However, rely on it judiciously. If conversion fails (e.g., trying to convert "abc"
to an integer), Pydantic raises a validation error, which FastAPI automatically translates into an informative HTTP error response for the client.
For more precise validation rules beyond basic type checking, Pydantic provides the Field
function. You use Field
as the default value when defining an attribute in your BaseModel
, allowing you to specify various constraints.
Let's explore common constraints useful for machine learning inputs.
When dealing with numerical features, you often need to restrict their range. Field
offers several parameters for this:
gt
: Greater thange
: Greater than or equal tolt
: Less thanle
: Less than or equal tofrom pydantic import BaseModel, Field
class ModelInput(BaseModel):
# Feature must be positive
feature1: float = Field(..., gt=0)
# Probability must be between 0 and 1 (inclusive)
probability_threshold: float = Field(..., ge=0, le=1)
# Integer feature within a specific range
count_feature: int = Field(..., ge=0, le=100)
Here, ...
(Ellipsis) indicates that the field is required. If you provide a default value instead of ...
, the field becomes optional. Using these constraints ensures that nonsensical values (like negative probabilities or counts) are rejected before they reach your model prediction logic.
Text inputs also frequently benefit from constraints:
min_length
: Minimum string lengthmax_length
: Maximum string lengthpattern
: A regular expression the string must matchfrom pydantic import BaseModel, Field
import re # Import the regex module
class UserProfile(BaseModel):
username: str = Field(..., min_length=3, max_length=50)
# Ensure user ID follows a specific format (e.g., starts with 'UID' followed by digits)
user_id: str = Field(..., pattern=r"^UID\d+$")
# Optional bio with max length
bio: str | None = Field(default=None, max_length=250)
Regular expression patterns (pattern
) are particularly useful for validating structured identifiers or specific text formats often encountered in data preprocessing steps.
For features represented as lists or other collections (like feature vectors), you might need to enforce size limits:
min_items
: Minimum number of items in the collectionmax_items
: Maximum number of items in the collectionfrom pydantic import BaseModel, Field
class EmbeddingInput(BaseModel):
# Expecting a fixed-size vector of 128 floats
vector: list[float] = Field(..., min_items=128, max_items=128)
# Tags list, must have at least one tag, max 10
tags: list[str] = Field(..., min_items=1, max_items=10)
This is essential for many ML models that expect input vectors of a specific dimensionality. Validating the size early prevents runtime errors during model inference.
Let's combine these concepts into a Pydantic model designed to validate input for a hypothetical house price prediction model.
from pydantic import BaseModel, Field
class HouseFeatures(BaseModel):
area_sqft: float = Field(
...,
gt=0,
description="Surface area of the house in square feet.",
example=1500.50
)
bedrooms: int = Field(
...,
ge=1,
le=10,
description="Number of bedrooms.",
example=3
)
year_built: int = Field(
...,
gt=1800,
lt=2025, # Assuming current year context
description="Year the house was built.",
example=1995
)
zip_code: str = Field(
...,
pattern=r"^\d{5}$", # US 5-digit zip code format
description="5-digit US zip code.",
example="90210"
)
# Example usage in a FastAPI endpoint
# from fastapi import FastAPI
# app = FastAPI()
# @app.post("/predict_price/")
# async def predict_house_price(features: HouseFeatures):
# # At this point, 'features' is guaranteed to be valid
# # according to the constraints defined in HouseFeatures.
# # Proceed with model prediction...
# prediction = ... # your_model.predict([[features.area_sqft, ...]])
# return {"predicted_price": prediction}
If a client sends a request to /predict_price/
with data that violates any of these constraints (e.g., bedrooms: 0
, area_sqft: -100
, or zip_code: "abcde"
), FastAPI, powered by Pydantic, will automatically return a 422 Unprocessable Entity
error response. This response details exactly which fields failed validation and why, providing clear feedback to the API consumer.
By leveraging Pydantic's Field
function for data conversion and constraints, you move validation logic out of your endpoint functions and into declarative, reusable models. This leads to cleaner API code, improved robustness against invalid data, and better adherence to the principle of "fail fast" by catching errors at the earliest possible stage. This ensures that the data reaching your ML model inference code is already vetted for correctness according to the rules you've defined.
© 2025 ApX Machine Learning