Many standard Python programs execute synchronously. This means when the program encounters an operation, it waits for that operation to complete before moving to the next line of code. Imagine a chef meticulously preparing one dish from start to finish before even considering the next order. While simple to reason about, this approach can be inefficient, especially for applications that spend a lot of time waiting for external events, like network requests or disk reads.
Web applications, including APIs serving machine learning models, frequently perform such I/O-bound operations. A request might require fetching user data from a database, calling another microservice, or loading model features from storage. In a synchronous world, while the server waits for the database response, it cannot process any other incoming requests, leading to poor performance and resource utilization under load.
This is where asynchronous programming comes in. Instead of waiting idly, an asynchronous application can switch to handling other tasks while one task is waiting for an I/O operation to complete. When the operation finishes, the application can resume the original task where it left off. Think of our chef starting to boil water for pasta, then switching to chop vegetables while the water heats up, returning to the pasta only when the water is boiling. This allows for much higher concurrency, meaning the ability to handle many operations seemingly simultaneously with fewer resources.
Python provides native support for asynchronous programming through the async
and await
syntax, built upon the concept of coroutines.
async def
You define an asynchronous function, or coroutine, using async def
instead of just def
:
async def get_data_from_network():
# Code to fetch data...
print("Fetching data...")
# Simulate waiting for a network response
await asyncio.sleep(1) # This is where the magic happens
print("Data received!")
return {"data": "some result"}
A function defined with async def
doesn't run like a regular function when called. Instead, calling it returns a coroutine object. This object represents the work defined in the function but doesn't execute it immediately.
await
The await
keyword is used inside an async def
function. It tells Python that the operation being called (which must itself be awaitable, typically another coroutine or an object designed for async operations) might take some time. When the program encounters await
, it can suspend the execution of the current coroutine, allowing the application to perform other tasks. Once the awaited operation completes, the execution of the suspended coroutine resumes from that point.
Crucially, await
can only be used inside functions defined with async def
.
How does the application know when to resume a suspended coroutine? This is managed by the event loop. The event loop is the core of any asynchronous application. It keeps track of all running and suspended tasks. When a task uses await
to pause for an I/O operation, it yields control back to the event loop. The event loop can then run other ready tasks. When the I/O operation completes, the event loop schedules the original task to resume its execution. Python's built-in asyncio
library provides the event loop and tools for managing asynchronous tasks.
Basic flow of control in an asynchronous operation involving an event loop.
FastAPI is built directly on top of these asynchronous capabilities (specifically, using the ASGI standard and the Starlette toolkit, which relies on asyncio
). This allows you to write standard Python code using async
and await
in your API route handlers. When FastAPI receives a request, it runs the corresponding route handler within the event loop. If your handler performs an await
on an I/O-bound operation (like interacting with a database asynchronously or calling an external API), FastAPI automatically handles the pausing and resuming, allowing the server to efficiently handle many concurrent requests. This inherent support for asynchronous operations is a primary reason for FastAPI's high performance, especially beneficial for ML APIs that might need to fetch data or preprocess inputs before inference.
While the core inference step of many ML models is CPU-bound (and requires special handling in async code, which we'll cover in Chapter 5), the surrounding tasks like data validation, logging, fetching features, or saving results often involve I/O and benefit greatly from the async model. Understanding these fundamentals prepares you for writing efficient and responsive API endpoints. We will start building simple synchronous and asynchronous endpoints in the upcoming sections.
© 2025 ApX Machine Learning