Eigenvalues and eigenvectors reveal directions in space that remain unchanged (except for scaling) when a linear transformation is applied. This property is remarkably useful for uncovering the underlying structure within data, forming the foundation of Principal Component Analysis (PCA).
PCA is a cornerstone technique in machine learning and data analysis, primarily used for dimensionality reduction. The main objective is to identify a new set of coordinate axes, called principal components, for the data. These new axes are chosen such that the maximum possible variance in the data is captured along the first axis, the maximum remaining variance along the second axis (orthogonal to the first), and so on. This allows us to represent the data using fewer dimensions while minimizing information loss.
To find these directions of maximum variance, we first need to quantify how the different features in our dataset vary together. This is precisely what the covariance matrix (C) does. For a dataset with p features, the covariance matrix is a p×p symmetric matrix where:
This matrix summarizes the spread and linear relationships within the data.
Here's the critical connection: The eigenvectors of the data's covariance matrix C point along the directions of maximum variance in the data. These directions are the principal components. Furthermore, the eigenvalue λ associated with each eigenvector quantifies the amount of variance along that specific direction.
Think back to the definition Ax=λx. In the context of PCA, our matrix A is the covariance matrix C. The eigenvectors x (often denoted as u or v in PCA literature) are the principal components. When we apply the transformation represented by C to one of its eigenvectors u, the result is simply the same vector scaled by its eigenvalue λ: Cu=λu.
Why is this important? It turns out that finding the direction u (a unit vector) that maximizes the variance of the data projected onto it (uTCu) leads directly to the eigenvector equation. The direction u that maximizes this variance is the eigenvector corresponding to the largest eigenvalue of C. The maximum variance achieved is equal to that largest eigenvalue λmax.
So, the principal components of the data are simply the eigenvectors of its covariance matrix C, typically ordered from highest eigenvalue to lowest.
Imagine a scatter plot of 2D data points that form an elliptical cloud. The eigenvectors of the covariance matrix of this data will align with the axes of this ellipse.
Scatter plot showing 2D data points. PC1 (red line) indicates the direction of maximum variance, determined by the eigenvector with the largest eigenvalue. PC2 (orange dotted line), orthogonal to PC1, captures the next largest variance.
The eigenvalues tell us the amount of variance captured by each principal component (eigenvector). By ranking the eigenvectors according to their corresponding eigenvalues in descending order, we know which directions are most "important" for describing the data's spread.
To reduce data from p dimensions to a smaller number k (where k<p), PCA involves these steps:
The result is a k-dimensional dataset that retains the most significant variance present in the original p-dimensional data. The proportion of total variance retained by the first k components can be calculated by summing the top k eigenvalues and dividing by the sum of all eigenvalues.
Eigenvalues and eigenvectors are fundamental to Principal Component Analysis because:
This relationship allows PCA to systematically identify the most informative directions in the data, making it possible to reduce dimensionality effectively while preserving essential data structure. Understanding this connection is significant for anyone applying or interpreting PCA results in machine learning. You will learn how to perform these calculations using libraries like NumPy in the upcoming practical exercises.
© 2025 ApX Machine Learning