All Courses

Quantile Transformation

While methods like Standardization and Normalization adjust the scale of features, and transformations like Log or Box-Cox attempt to make distributions more Gaussian-like, Quantile Transformation takes a different approach. It's a non-linear transformation that maps the probability distribution of a feature to another specific distribution (either uniform or normal), regardless of the original distribution's shape. This is achieved by leveraging the ranks, or quantiles, of the data points.

How Quantile Transformation Works

The core idea behind quantile transformation is to estimate the empirical cumulative distribution function (CDF) of a feature and then use this CDF to map the original values to the desired output distribution.

Estimate Empirical CDF: For each data point $x$ in a feature, its position relative to other points is determined. Essentially, we calculate the proportion of data points less than or equal to $x$ . This gives us an estimate of the CDF value, $F(x) = P(X \le x)$ , which ranges between 0 and 1.
Map to Target Distribution: These CDF values (which are uniformly distributed if the feature was continuous) are then mapped to the quantiles of the target distribution:
- Uniform Distribution: If the target is a uniform distribution on $[0, 1]$ , the mapping is straightforward. The estimated CDF value itself becomes the transformed value. This effectively spreads out the data points evenly across the [0, 1] range based on their rank.
- Normal Distribution: If the target is a standard normal distribution ( $N(0, 1)$ ), the estimated CDF value $u = F(x)$ is mapped using the inverse CDF (also known as the quantile function or percent-point function) of the standard normal distribution, $\Phi^{-1}$ . The transformed value becomes $z = \Phi^{-1}(u)$ . This process projects the data onto a Gaussian shape.

Because this method relies on the rank order of the data points rather than their absolute values, it is inherently strong to outliers. Outliers will be mapped to the extreme ends of the target distribution (e.g., close to 0 or 1 for uniform, or large negative/positive values for normal) but won't disproportionately affect the transformation of other points, unlike StandardScaler or MinMaxScaler.

Implementation with Scikit-learn

Scikit-learn provides the sklearn.preprocessing.QuantileTransformer class for this purpose. Let's see how to use it.

import numpy as np
import pandas as pd
from sklearn.preprocessing import QuantileTransformer
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Generate some skewed data
np.random.seed(42)
data_original = np.random.exponential(scale=2, size=1000).reshape(-1, 1) + 1 # Add 1 to avoid issues with zero if using log later

# Initialize transformers
qt_uniform = QuantileTransformer(output_distribution='uniform', n_quantiles=1000, random_state=42)
qt_normal = QuantileTransformer(output_distribution='normal', n_quantiles=1000, random_state=42)

# Apply transformations
data_uniform = qt_uniform.fit_transform(data_original)
data_normal = qt_normal.fit_transform(data_original)

# Create DataFrame for easier plotting
df = pd.DataFrame({
    'Original': data_original.flatten(),
    'Uniform Quantile': data_uniform.flatten(),
    'Normal Quantile': data_normal.flatten()
})

# --- Visualization ---
fig = make_subplots(rows=1, cols=3, subplot_titles=('Original Exponential Data', 'Uniform Quantile Transformed', 'Normal Quantile Transformed'))

fig.add_trace(go.Histogram(x=df['Original'], name='Original', marker_color='#4dabf7'), row=1, col=1)
fig.add_trace(go.Histogram(x=df['Uniform Quantile'], name='Uniform', marker_color='#38d9a9'), row=1, col=2)
fig.add_trace(go.Histogram(x=df['Normal Quantile'], name='Normal', marker_color='#be4bdb'), row=1, col=3)

fig.update_layout(
    title_text='Effect of Quantile Transformation on Skewed Data',
    bargap=0.1,
    showlegend=False,
    height=350,
    margin=dict(l=20, r=20, t=60, b=20)
)

# Display the Plotly chart JSON
# print(fig.to_json()) # You would run this in your environment

The distribution of the original exponential data.

The distribution after uniform quantile transformation. Notice how the values are spread out evenly.

The distribution after normal quantile transformation. The data now resembles a Gaussian shape.

Strengths and Considerations

Quantile transformation offers several advantages and important considerations:

Robustness to Outliers: As mentioned, it's robust to outliers because it operates on ranks, not absolute values. A very large or small outlier will only occupy the extreme end of the transformed distribution, not skew the entire scale for other points.
Handles Non-Gaussian Data: It's especially useful for features that do not follow a Gaussian distribution, which is often a requirement for many linear models (e.g., Linear Regression, Logistic Regression) and some distance-based algorithms (e.g., K-Nearest Neighbors, Support Vector Machines).
Maintains Rank Order: The transformation preserves the rank order of the data, meaning that if $x_1 < x_2$ in the original data, then $T(x_1) < T(x_2)$ in the transformed data. This is important for many machine learning algorithms where the relative order of values carries information.
Data Leakage with fit_transform: Like other scalers, it's important to apply fit only on the training data and then transform both training and test data. Applying fit_transform directly to the entire dataset (including test data) before splitting can lead to data leakage, where information from the test set implicitly influences the training process, resulting in overly optimistic performance estimates.

When to Use Quantile Transformation

Consider using quantile transformation when:

Your features have highly skewed or non-Gaussian distributions that can negatively impact the performance of algorithms sensitive to distribution shape.
You are concerned about the influence of outliers on your transformations and model.
You want to apply models that perform better with normally distributed inputs, but traditional methods like log transformation aren't sufficient or appropriate (e.g., due to negative values or zeros).

While a powerful tool, quantile transformation can sometimes make the model's interpretability more challenging, as the transformed values no longer have a direct, linear relationship to the original scale. However, for many predictive modeling tasks, the improved model performance often outweighs this drawback.

Was this section helpful?