Once your LLM application is packaged, perhaps within a Docker container, the next step is to make its functionality accessible to users or other services. Typically, this is done by exposing it as a web Application Programming Interface (API). An API acts as a contract, defining how external clients can interact with your application over a network, usually via HTTP. This approach decouples your core LLM logic from any specific user interface and allows various clients (web apps, mobile apps, other backend services) to utilize its capabilities.
Python offers several excellent web frameworks for building APIs. We will focus on two popular choices: FastAPI and Flask. Both provide the tools needed to define API endpoints (specific URLs), handle incoming requests, process data, interact with your LLM workflow components, and send back responses.
FastAPI is a modern, high-performance Python web framework built on standard Python type hints. It's known for its speed (comparable to NodeJS and Go), automatic data validation using Pydantic, dependency injection features, and automatic interactive API documentation (Swagger UI and ReDoc). Its native support for asynchronous operations (async
/await
) makes it particularly well-suited for I/O-bound tasks like making requests to external LLM APIs, preventing your application from blocking while waiting for the LLM response.
Let's create a simple FastAPI endpoint to interact with a hypothetical LLM query function.
First, ensure you have FastAPI and an ASGI server like Uvicorn installed:
pip install fastapi uvicorn pydantic openai # Or your specific LLM client library
Now, create a Python file (e.g., main.py
):
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os
# Assume we have a function setup_llm_chain() from a previous module
# that returns a configured LangChain chain or similar callable
# from my_llm_logic import setup_llm_chain
# Placeholder for demonstration if setup_llm_chain isn't available
def dummy_llm_call(query: str) -> str:
print(f"Simulating LLM call for: {query}")
# In a real app, this calls your chain, agent, or direct API
# e.g., return llm_chain.invoke({"input": query})
if "hello" in query.lower():
return "Hello there! How can I help you today?"
else:
return f"I received your query: '{query}'. Processing..."
# Define request body structure using Pydantic
class QueryRequest(BaseModel):
text: str
user_id: str | None = None # Example optional field
# Define response body structure
class QueryResponse(BaseModel):
answer: str
request_text: str
# Initialize FastAPI app
app = FastAPI(
title="LLM Query Service",
description="API endpoint to interact with our LLM workflow."
)
# Load or initialize your LLM interaction logic (e.g., LangChain chain)
# In a real application, manage this object's lifecycle appropriately.
# llm_chain = setup_llm_chain()
@app.post("/query", response_model=QueryResponse)
async def process_query(request: QueryRequest):
"""
Accepts a user query, processes it through the LLM workflow,
and returns the response.
"""
print(f"Received query from user: {request.user_id or 'anonymous'}")
try:
# Replace dummy_llm_call with your actual LLM interaction
# result = await llm_chain.ainvoke({"input": request.text}) # If using async LangChain
result = dummy_llm_call(request.text) # Synchronous example
# Assuming the result is a string or can be accessed like result['output_key']
llm_answer = result # Adjust based on your actual return structure
return QueryResponse(answer=llm_answer, request_text=request.text)
except Exception as e:
# Log the exception details here
print(f"Error processing query: {e}")
raise HTTPException(status_code=500, detail="Internal server error processing the query.")
# Optional: Add a simple root endpoint for health checks
@app.get("/")
def read_root():
return {"status": "LLM API is running"}
To run this application, use Uvicorn:
uvicorn main:app --reload --host 0.0.0.0 --port 8000
The --reload
flag automatically restarts the server when code changes are detected, useful during development. Pointing your browser to http://localhost:8000/docs
will show the interactive Swagger UI documentation automatically generated by FastAPI. You can test the /query
endpoint directly from there.
Key benefits demonstrated:
request: QueryRequest
clearly defines the expected input structure.QueryRequest
model. If the text
field is missing or not a string, it returns a 422 Unprocessable Entity error.QueryResponse
model.async def
allows using await
for non-blocking calls (like await llm_chain.ainvoke(...)
if your chain supports async)./docs
endpoint provides invaluable interactive documentation.Flask is another widely used Python web framework. It's often considered simpler and more explicit than FastAPI, following a micro-framework philosophy. It provides the basics for routing and request handling, leaving choices about data validation, asynchronous support (possible via extensions or ASGI servers), and other features more up to the developer.
Here's a similar example using Flask:
First, install Flask and potentially a production-ready WSGI server like Gunicorn:
pip install Flask gunicorn openai # Or your specific LLM client library
Create a Python file (e.g., app.py
):
from flask import Flask, request, jsonify
import os
# Assume we have a function setup_llm_chain() from a previous module
# from my_llm_logic import setup_llm_chain
# Placeholder for demonstration
def dummy_llm_call(query: str) -> str:
print(f"Simulating LLM call for: {query}")
# In a real app, this calls your chain, agent, or direct API
# e.g., return llm_chain.invoke({"input": query})
if "hello" in query.lower():
return "Hello there! How can I help you today?"
else:
return f"I received your query: '{query}'. Processing..."
# Initialize Flask app
app = Flask(__name__)
# Load or initialize your LLM interaction logic
# llm_chain = setup_llm_chain()
@app.route("/query", methods=['POST'])
def process_query():
"""
Accepts a user query via JSON, processes it, and returns JSON response.
"""
if not request.is_json:
return jsonify({"error": "Request must be JSON"}), 400
data = request.get_json()
query_text = data.get('text')
user_id = data.get('user_id', 'anonymous') # Example optional field
if not query_text:
return jsonify({"error": "Missing 'text' field in request body"}), 400
print(f"Received query from user: {user_id}")
try:
# Replace dummy_llm_call with your actual LLM interaction
llm_answer = dummy_llm_call(query_text) # Synchronous example
response_data = {
"answer": llm_answer,
"request_text": query_text
}
return jsonify(response_data), 200
except Exception as e:
# Log the exception details here
print(f"Error processing query: {e}")
return jsonify({"error": "Internal server error processing the query."}), 500
# Optional: Add a simple root endpoint for health checks
@app.route("/")
def index():
return jsonify({"status": "LLM API is running"}), 200
if __name__ == '__main__':
# For development server only
app.run(host='0.0.0.0', port=8000, debug=True)
To run this in development:
python app.py
For production, you would typically use a WSGI server like Gunicorn:
gunicorn -w 4 -b 0.0.0.0:8000 app:app
Here, -w 4
starts 4 worker processes.
Key aspects of the Flask example:
request.is_json
and use request.get_json()
to access the data.data.get('text')
) is done explicitly within the route function. More complex validation often involves libraries like Marshmallow or Cerberus.jsonify()
is used to convert the Python dictionary into a JSON response.async
/await
in FastAPI, or async support in Flask via ASGI) to prevent your API server from being blocked while waiting for the LLM, allowing it to handle other incoming requests concurrently.Choosing between FastAPI and Flask often depends on project needs. FastAPI's built-in features for data validation, async support, and automatic documentation are compelling for complex APIs, especially those heavily reliant on I/O operations like LLM calls. Flask's simplicity and flexibility make it a great choice for smaller services or when you prefer to select and integrate components manually. Both provide solid foundations for creating the API layer that makes your deployed LLM application usable.
© 2025 ApX Machine Learning