Having explored the reasons for dimensionality reduction and surveyed various methods, it's time to get practical. Principal Component Analysis (PCA) is a widely used linear technique for dimensionality reduction, and understanding its application will provide a solid foundation before we address more advanced autoencoder models. This hands-on exercise will walk you through implementing PCA on a common dataset using Python and Scikit-learn.
Our goal is to reduce the number of features in a dataset while trying to preserve as much of the original information (variance) as possible. We'll also visualize the results to get an intuitive sense of what PCA achieves.
We'll use the well-known Iris dataset, which is conveniently available in Scikit-learn. This dataset consists of 150 samples of Iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. There are three species of Iris: Setosa, Versicolor, and Virginica.
First, let's load the dataset:
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels (species)
print(f"Original data shape: {X.shape}")
# Expected output: Original data shape: (150, 4)
The output shows 150 samples and 4 features, as expected.
PCA is sensitive to the scale of the features. If one feature has a much larger range of values than others, it will dominate the PCA calculation. To prevent this, we should standardize the features to have zero mean and unit variance. Scikit-learn's StandardScaler
is perfect for this.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# You can verify the mean and standard deviation of scaled data
# print(f"Mean of scaled data (approx): {np.mean(X_scaled, axis=0)}")
# print(f"Std dev of scaled data (approx): {np.std(X_scaled, axis=0)}")
With our data scaled, we are ready to apply PCA.
We'll use Scikit-learn's PCA
class. A main parameter for PCA is n_components
, which specifies the number of principal components (i.e., the new dimensions) we want to reduce our data to. Let's start by reducing the 4 features of the Iris dataset to 2 principal components, which will allow us to visualize the data in a 2D plot.
from sklearn.decomposition import PCA
# Initialize PCA with 2 components
pca = PCA(n_components=2)
# Fit PCA on the scaled data and transform it
X_pca = pca.fit_transform(X_scaled)
print(f"Shape after PCA: {X_pca.shape}")
# Expected output: Shape after PCA: (150, 2)
As you can see, the data now has 150 samples and 2 features (our principal components). These two components are new features constructed by PCA, representing linear combinations of the original four features.
Now that we've reduced the dimensionality to two, we can create a scatter plot to see how the different Iris species are distributed in this new 2D space. We'll use the original target labels y
to color-code the points.
The Iris species are plotted in the 2D space defined by the first two principal components. Notice how well the classes are separated, even after reducing the dimensionality.
This visualization is quite informative. It shows that the first two principal components capture enough information to distinguish between the three Iris species quite well. The Setosa species (blue) is clearly separated from Versicolor (green) and Virginica (orange).
How much information did we retain by reducing to 2 components? PCA helps us quantify this using the explained_variance_ratio_
attribute. This attribute returns an array where each value represents the percentage of variance explained by each of the selected components.
print(f"Explained variance ratio per component: {pca.explained_variance_ratio_}")
# Expected output (approx): Explained variance ratio per component: [0.72962445 0.22850762]
print(f"Total explained variance by 2 components: {np.sum(pca.explained_variance_ratio_):.4f}")
# Expected output (approx): Total explained variance by 2 components: 0.9581
The first principal component explains about 73% of the variance, and the second explains about 23%. Together, these two components capture approximately 95.81% of the total variance in the original 4-dimensional data. This is quite good; we've reduced the dimensionality by half while retaining most of the data's variability.
Choosing the right number of components often involves a trade-off between dimensionality reduction and information loss. A common way to decide is to plot the cumulative explained variance against the number of components.
# Fit PCA with all components to see the full spectrum
pca_full = PCA().fit(X_scaled)
cumulative_explained_variance = np.cumsum(pca_full.explained_variance_ratio_)
Now, let's visualize this cumulative explained variance.
This chart shows the cumulative variance explained by an increasing number of principal components. You can see that with 2 components, over 95% of the variance is captured.
This plot helps you decide how many components to keep. For example, if you aim to retain 95% of the variance, two components are sufficient for the Iris dataset. If you needed 99%, you would choose three components.
PCA also allows you to transform the reduced data back to the original high-dimensional space using the inverse_transform
method. The reconstructed data won't be identical to the original (unless you use all components), as some information is lost during dimensionality reduction. This idea of reconstruction is central to autoencoders, as we will see later.
X_reconstructed = pca.inverse_transform(X_pca)
print(f"Shape of reconstructed data: {X_reconstructed.shape}")
# Expected output: Shape of reconstructed data: (150, 4)
# Note: X_reconstructed is an approximation of X_scaled
This hands-on exercise demonstrated how PCA can effectively reduce dimensionality while preserving significant variance in the data. You learned to:
PCA performs a linear transformation. While powerful, it might not capture complex, non-linear relationships in data as effectively. This is where autoencoders, the main topic of this course, come into play. Autoencoders are neural networks that can learn non-linear dimensionality reductions. The principles of encoding data into a lower-dimensional space (like PCA's principal components) and then decoding it back are fundamental to autoencoders as well. Having practiced with PCA, you are now better prepared to understand the mechanisms and benefits of autoencoders for feature extraction.
Was this section helpful?
© 2025 ApX Machine Learning