Decorators provide a powerful and Pythonic way to modify or enhance functions and methods. They allow you to wrap additional functionality around existing code without permanently altering the original function's definition. This promotes code reusability and separation of concerns, which are valuable practices when building data processing pipelines or machine learning workflows.
At its core, a decorator is a callable (usually a function) that takes another function as input and returns a new function. The @decorator_name
syntax placed directly above a function definition is syntactic sugar that simplifies this process.
Consider this basic structure:
import functools
def my_decorator(func):
@functools.wraps(func) # Preserves original function metadata
def wrapper(*args, **kwargs):
# Code to execute BEFORE the original function runs
print(f"Something is happening before {func.__name__} is called.")
result = func(*args, **kwargs) # Call the original function
# Code to execute AFTER the original function runs
print(f"Something is happening after {func.__name__} has finished.")
return result
return wrapper
@my_decorator
def say_hello(name):
"""Greets the user."""
print(f"Hello, {name}!")
# Calling the decorated function
say_hello("Data Scientist")
# Output:
# Something is happening before say_hello is called.
# Hello, Data Scientist!
# Something is happening after say_hello has finished.
Here, my_decorator
is the decorator function. It defines an inner function, wrapper
, which contains the additional logic. The wrapper
function calls the original function (func
) passed to the decorator. The @my_decorator
syntax above say_hello
is equivalent to writing say_hello = my_decorator(say_hello)
after the function definition.
Notice the use of @functools.wraps(func)
inside the decorator. This is a helper decorator that updates the wrapper
function to look like the original function (func
) by copying attributes such as __name__
, __doc__
, and the parameter signature. Without @functools.wraps
, introspection tools (and potentially other code) would see information about the wrapper
function instead of the say_hello
function.
You can think of the decorator applying a layer around the original function:
The decorator (
my_decorator
) defines awrapper
. When the decorated function (say_hello
) is called, thewrapper
executes, running code before and after calling the original function (func
).
Decorators are particularly useful for adding cross-cutting functionality relevant to data analysis and machine learning tasks:
Timing Function Execution: Measuring how long specific data processing steps or calculations take is important for optimization.
import time
import functools
import pandas as pd
import numpy as np
def timer(func):
@functools.wraps(func)
def wrapper_timer(*args, **kwargs):
start_time = time.perf_counter() # More precise than time.time()
value = func(*args, **kwargs)
end_time = time.perf_counter()
run_time = end_time - start_time
print(f"Finished {func.__name__!r} in {run_time:.4f} secs")
return value
return wrapper_timer
@timer
def simulate_data_processing(rows=1000000):
"""Simulates a potentially time-consuming data operation."""
df = pd.DataFrame(np.random.rand(rows, 5), columns=list('ABCDE'))
# Simulate some calculation
result = df['A'] * np.sin(df['B']) - df['C'] * np.cos(df['D'])
time.sleep(0.5) # Simulate I/O or other delay
return result.mean()
mean_value = simulate_data_processing(rows=500000)
print(f"Mean result: {mean_value}")
# Example Output:
# Finished 'simulate_data_processing' in 0.6123 secs
# Mean result: 0.001...
Logging: Tracking function calls, arguments, or results can be invaluable for debugging complex pipelines.
import logging
import functools
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def logger(func):
@functools.wraps(func)
def wrapper_logger(*args, **kwargs):
logging.info(f"Calling {func.__name__} with args: {args}, kwargs: {kwargs}")
try:
result = func(*args, **kwargs)
logging.info(f"{func.__name__} returned: {type(result)}")
return result
except Exception as e:
logging.error(f"Exception in {func.__name__}: {e}", exc_info=True)
raise # Re-raise the exception after logging
return wrapper_logger
@logger
def load_data(filepath):
"""Loads data, potentially raising an error."""
if not filepath.endswith(".csv"):
raise ValueError("Invalid file type, expected .csv")
# Simulate loading data
print(f"Loading data from {filepath}...")
return pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) # Dummy DataFrame
try:
df = load_data("my_data.csv")
# df_error = load_data("my_data.txt") # Uncomment to see error logging
except ValueError as e:
print(f"Caught expected error: {e}")
# Example Log Output:
# 2023-10-27 10:30:00,123 - INFO - Calling load_data with args: ('my_data.csv',), kwargs: {}
# Loading data from my_data.csv...
# 2023-10-27 10:30:00,124 - INFO - load_data returned: <class 'pandas.core.frame.DataFrame'>
# (If error case uncommented)
# 2023-10-27 10:30:00,125 - INFO - Calling load_data with args: ('my_data.txt',), kwargs: {}
# 2023-10-27 10:30:00,126 - ERROR - Exception in load_data: Invalid file type, expected .csv
# Traceback (most recent call last): ...
# Caught expected error: Invalid file type, expected .csv
Input Validation: Ensuring functions receive data in the expected format (e.g., a DataFrame with specific columns) before proceeding.
import functools
import pandas as pd
def requires_columns(required_cols):
def decorator(func):
@functools.wraps(func)
def wrapper_validator(*args, **kwargs):
# Assume the DataFrame is the first positional argument
if args and isinstance(args[0], pd.DataFrame):
df = args[0]
missing_cols = set(required_cols) - set(df.columns)
if missing_cols:
raise ValueError(f"Missing required columns in DataFrame for {func.__name__}: {missing_cols}")
else:
# Could add more sophisticated checks for kwargs or other positions
pass # Or raise error if DF not found where expected
return func(*args, **kwargs)
return wrapper_validator
return decorator
@requires_columns(['feature1', 'target'])
def process_features(df):
"""Processes specific features in a DataFrame."""
print("Processing features...")
# Actual processing logic here
return df['feature1'] * 2
data_ok = pd.DataFrame({'feature1': [1, 2, 3], 'target': [0, 1, 0], 'extra': [5, 6, 7]})
data_bad = pd.DataFrame({'feature_typo': [1, 2, 3], 'target': [0, 1, 0]})
result = process_features(data_ok) # Runs fine
try:
process_features(data_bad) # Raises ValueError
except ValueError as e:
print(f"Validation failed: {e}")
# Output:
# Processing features...
# Validation failed: Missing required columns in DataFrame for process_features: {'feature1'}
This example also demonstrates a decorator with arguments. requires_columns
is a factory function that takes the required columns list and returns the actual decorator function. This allows customization of the decorator's behavior.
Memoization (Caching): Storing the results of computationally expensive function calls and returning the cached result when the same inputs occur again. Python's functools
module provides lru_cache
(Least Recently Used cache) for this.
import functools
import time
@functools.lru_cache(maxsize=None) # None means unlimited cache size
def expensive_calculation(a, b):
"""Simulates an expensive computation."""
print(f"Performing expensive calculation for ({a}, {b})...")
time.sleep(1) # Simulate work
return a + b * b
print(expensive_calculation(2, 3)) # Runs calculation
print(expensive_calculation(5, 2)) # Runs calculation
print(expensive_calculation(2, 3)) # Returns cached result instantly
print(expensive_calculation(5, 2)) # Returns cached result instantly
# Output:
# Performing expensive calculation for (2, 3)...
# 11
# Performing expensive calculation for (5, 2)...
# 9
# 11
# 9
While NumPy and Pandas operations are often highly optimized internally, lru_cache
can be beneficial for custom Python functions within your workflow that perform heavy computations on the same inputs repeatedly.
You can apply multiple decorators to a single function. They are applied in order from bottom to top (syntactically) but executed from top to bottom (the outermost wrapper runs first).
@timer
@logger
# @requires_columns(['input']) # Example: Add validation
def complex_step(data):
# ... processing logic ...
print("Executing complex step...")
time.sleep(0.2)
return "Done"
complex_step("Some Input Data")
# Log Output will show timer starting/ending around logger messages.
# Execution order: timer wrapper -> logger wrapper -> original complex_step
Decorators are a flexible tool for adding behavior like logging, timing, validation, or caching to your functions without cluttering the core logic. Mastering them allows you to write more modular, reusable, and maintainable Python code, which is highly advantageous in data science and machine learning projects where workflows can become complex.
© 2025 ApX Machine Learning