All Courses

Handling Input Data (JSON)

Okay, your Flask application now has a dedicated endpoint ready to serve predictions. But how does it actually get the data it needs to make a prediction? When a user or another application wants a prediction, they need to send the input features (like measurements, text, or image data) to your API. This section explains how your Flask service can receive and handle this incoming data, typically formatted as JSON.

Why JSON for API Data?

Imagine you need to send structured information over the internet. You need a format that's easy for both humans to read and write, and easy for machines to parse and generate. JSON (JavaScript Object Notation) fits this description perfectly.

JSON represents data as key-value pairs, similar to Python dictionaries, and ordered lists, similar to Python lists. Its simplicity and text-based nature make it the de facto standard for transferring data in web APIs.

Here's an example of how input features for a simple model might look in JSON format:

{
  "sepal_length": 5.1,
  "sepal_width": 3.5,
  "petal_length": 1.4,
  "petal_width": 0.2
}

Or, if you expect multiple features as a list:

{
  "features": [5.1, 3.5, 1.4, 0.2]
}

Using JSON provides a common language for the client sending the request and your Flask API receiving it.

Accessing Request Data in Flask

Flask makes it straightforward to access data sent with an incoming web request. When a client sends data (like the JSON above) in the body of an HTTP request (typically using a POST method), Flask provides a global request object. You need to import it from the flask library:

from flask import Flask, request, jsonify
import joblib # Or pickle

# Assuming your Flask app is initialized
app = Flask(__name__)

# Load your model (adjust path as needed)
# model = joblib.load('your_model.pkl')
# preprocessor = joblib.load('your_preprocessor.pkl') # If you have one

@app.route('/predict', methods=['POST'])
def predict():
    # Access incoming request data here
    pass # We'll fill this in

Inside your route function (predict in this example), the request object holds all information about the incoming request, including headers, arguments, and, significantly, the data sent in the request body.

Parsing JSON Data

Since we expect the client to send data in JSON format, Flask provides a convenient method: request.get_json(). This method attempts to parse the request body as JSON and returns a Python dictionary (or list, depending on the JSON structure).

Let's update our predict function to use it:

# (Flask app setup and model loading above)

@app.route('/predict', methods=['POST'])
def predict():
    # Check if the request content type is JSON
    if not request.is_json:
        return jsonify({"error": "Request must be JSON"}), 400

    # Parse the JSON data from the request body
    try:
        data = request.get_json()
        app.logger.info(f"Received data: {data}") # Log received data
    except Exception as e:
        app.logger.error(f"Error parsing JSON: {e}")
        return jsonify({"error": "Could not parse JSON data"}), 400

    # --- Next steps: Validate and use the data ---
    # (Placeholder for validation and prediction)
    return jsonify({"message": "Data received, processing..."}) # Temporary response

Here's what's happening:

We first check request.is_json. This looks at the Content-Type header of the request (e.g., application/json) to give us confidence that the client intended to send JSON. If not, we return an error response with HTTP status code 400 (Bad Request).
We use request.get_json() to parse the data. This method can raise an exception if the data isn't valid JSON, so we wrap it in a try...except block for basic error handling.
We log the received data using app.logger.info. Logging is very useful for debugging.
If parsing fails, we log the error and return a specific error message and a 400 status code.

Validating the Input Data

Just receiving JSON isn't enough. Your model expects data in a specific structure and format. Before feeding the data to your model, you should perform some validation:

Structure Check: Does the JSON contain the keys you expect? If you expect a dictionary with a key named features, you should check if that key exists.
Format/Type Check (Basic): Are the values associated with the keys in the format your model needs? If features should be a list of numbers, check if it's actually a list and potentially if its elements look like numbers. More complex type validation is possible but might be overkill for a simple service.

Let's add a simple structure check:

# (Flask app setup and model loading above)

@app.route('/predict', methods=['POST'])
def predict():
    if not request.is_json:
        return jsonify({"error": "Request must be JSON"}), 400

    try:
        data = request.get_json()
        app.logger.info(f"Received data: {data}")
    except Exception as e:
        app.logger.error(f"Error parsing JSON: {e}")
        return jsonify({"error": "Could not parse JSON data"}), 400

    # --- Basic Validation ---
    # Example: Check if 'features' key exists and is a list
    if 'features' not in data or not isinstance(data['features'], list):
        app.logger.error("Invalid input: 'features' key missing or not a list")
        return jsonify({"error": "Missing or invalid 'features' key. Expected a list."}), 400

    input_features = data['features']
    # (Potentially add more checks, e.g., number of features)
    app.logger.info(f"Extracted features: {input_features}")

    # --- Next: Prepare data for the model and predict ---
    # (Placeholder for prediction logic)
    return jsonify({"message": f"Received features: {input_features}"}) # Temporary response

This validation step helps prevent errors later when you try to use the data with your model and provides clearer error messages to the client if they send improperly formatted data.

Preparing Data for the Model

Often, the data format received via JSON isn't exactly what your machine learning model expects. For example, scikit-learn models typically require input as a NumPy array or a similar structure (like a list of lists, where each inner list is a sample).

You'll need to convert the data extracted from the JSON into the required format. If you also saved preprocessing steps (like scalers or encoders), you'll apply them here as well.

# (Flask app setup, model loading, imports like numpy)
import numpy as np # Make sure to import numpy

@app.route('/predict', methods=['POST'])
def predict():
    # (JSON parsing and validation as above...)

    # --- Prepare data for the model ---
    try:
        # Assuming 'features' is a list of numbers [f1, f2, f3, ...]
        input_features = data['features']

        # Example: Convert to a NumPy array suitable for scikit-learn
        # Model expects a 2D array: [[f1, f2, f3, ...]] for a single prediction
        model_input = np.array(input_features).reshape(1, -1)
        app.logger.info(f"Prepared model input shape: {model_input.shape}")

        # If you have a preprocessor:
        # model_input = preprocessor.transform(model_input)
        # app.logger.info("Applied preprocessing")

    except Exception as e:
        app.logger.error(f"Error preparing data for model: {e}")
        return jsonify({"error": "Invalid data format for model processing."}), 400

    # --- Next: Make the prediction ---
    # prediction = model.predict(model_input)
    # (Placeholder for prediction logic and returning results)
    return jsonify({"message": f"Model input prepared: {model_input.tolist()}"}) # Temporary response

In this step:

We extract the list of features.
We convert the list into a NumPy array.
We use reshape(1, -1) because most scikit-learn models expect a 2D array where each row is a sample, even if we're predicting for just one sample.
A placeholder comment shows where you would apply any saved preprocessing steps.
We include error handling in case the conversion fails (e.g., if the list contained non-numeric data).

Now your Flask application is equipped to receive JSON data, perform basic validation, and transform it into a format ready for your machine learning model. The next step is to actually use this prepared data to make a prediction and return the result.

Was this section helpful?