Linear algebra is a cornerstone of many machine learning algorithms. From representing datasets and model parameters to performing transformations and solving optimization problems, operations on vectors and matrices are fundamental. NumPy, through its linalg module, provides a comprehensive and highly optimized suite of functions for performing these essential linear algebra tasks. Building upon your knowledge of NumPy arrays, we'll now see how to use them for these computations.

Recall that a one-dimensional NumPy array can represent a vector, and a two-dimensional array represents a matrix.

import numpy as np

# Vector (1D array)
v = np.array([1, 2, 3])
print("Vector v:\n", v)

# Matrix (2D array)
M = np.array([[1, 2], [3, 4], [5, 6]])
print("\nMatrix M:\n", M)
print("\nShape of M:", M.shape) # (3, 2) -> 3 rows, 2 columns

Matrix Multiplication

One of the most frequent operations in machine learning is matrix multiplication. It's used in everything from applying weights in neural networks to transforming feature spaces. It's important to distinguish between element-wise multiplication and true matrix multiplication (dot product).

Element-wise multiplication: Uses the * operator. Requires arrays to have compatible shapes according to broadcasting rules. Each element in the first array is multiplied by the corresponding element in the second array.
Matrix multiplication (Dot product): Uses the @ operator (preferred in Python 3.5+) or the np.dot() function. For two matrices, $A$ and $B$ , the product $AB$ is defined only if the number of columns in $A$ equals the number of rows in $B$ . If $A$ is an $m \times n$ matrix and $B$ is an $n \times p$ matrix, their product $C = AB$ will be an $m \times p$ matrix.

A = np.array([[1, 2], [3, 4]]) # 2x2 matrix
B = np.array([[5, 6], [7, 8]]) # 2x2 matrix
v = np.array([9, 10])          # vector (treated as 1x2 or 2x1 depending on context)

# Element-wise multiplication
print("Element-wise A * B:\n", A * B)

# Matrix multiplication
print("\nMatrix multiplication A @ B:\n", A @ B)
print("\nMatrix multiplication using np.dot(A, B):\n", np.dot(A, B))

# Matrix-vector multiplication
# NumPy automatically handles v as a column vector in this case
print("\nMatrix-vector multiplication A @ v:\n", A @ v) # Result is a 1D array

The rule for compatible shapes in matrix multiplication ( $m \times n$ times $n \times p$ results in $m \times p$ ) is significant.

Diagram illustrating matrix multiplication dimension compatibility.

Transpose of a Matrix

The transpose of a matrix swaps its rows and columns. If $A$ is an $m \times n$ matrix, its transpose, denoted as $A^T$ , is an $n \times m$ matrix where $(A^T)_{ij} = A_{ji}$ . In NumPy, you can get the transpose using the .T attribute or the np.transpose() function.

M = np.array([[1, 2, 3], [4, 5, 6]]) # 2x3 matrix
print("Original Matrix M:\n", M)
print("\nShape of M:", M.shape)

# Transpose using .T attribute
M_transpose = M.T
print("\nTranspose M.T:\n", M_transpose)
print("\nShape of M.T:", M_transpose.shape) # 3x2 matrix

# Transpose using np.transpose() function
M_transpose_func = np.transpose(M)
print("\nTranspose np.transpose(M):\n", M_transpose_func)
print("\nShape of np.transpose(M):", M_transpose_func.shape) # 3x2 matrix

Transposition is often used when manipulating equations or aligning vectors and matrices for multiplication according to shape rules.

Matrix Inverse and Determinant

The inverse of a square matrix $A$ , denoted as $A^{-1}$ , is a matrix such that when multiplied by $A$ , it results in the identity matrix $I$ ( $AA^{-1} = A^{-1}A = I$ ). A matrix must be square (have the same number of rows and columns) and non-singular (its determinant is non-zero) to have an inverse. The inverse is crucial for solving systems of linear equations.

The determinant is a scalar value that can be computed from the elements of a square matrix and provides important information about it, such as whether it's invertible.

NumPy's np.linalg module provides functions for these:

np.linalg.inv(A): Computes the inverse of matrix A.
np.linalg.det(A): Computes the determinant of matrix A.

# Create an invertible square matrix
A = np.array([[1, 2], [3, 4]])
print("Matrix A:\n", A)

# Calculate the determinant
det_A = np.linalg.det(A)
print("\nDeterminant of A:", det_A) # Should be 1*4 - 2*3 = -2

# Calculate the inverse
inv_A = np.linalg.inv(A)
print("\nInverse of A:\n", inv_A)

# Verify A @ A_inv is close to the identity matrix
identity = np.eye(2) # 2x2 Identity matrix
print("\nA @ inv_A (should be close to identity):\n", A @ inv_A)
# Note: Due to floating-point precision, results might be very close but not exactly identity.
print("\nIs A @ inv_A close to identity?", np.allclose(A @ inv_A, identity))

If you try to compute the inverse of a singular matrix (determinant is 0), NumPy will raise a LinAlgError.

# Singular matrix (column 2 is 2 * column 1)
singular_M = np.array([[1, 2], [2, 4]])
print("\nSingular Matrix:\n", singular_M)
print("Determinant:", np.linalg.det(singular_M)) # Should be 0 or very close due to float precision

try:
    inv_singular = np.linalg.inv(singular_M)
    print("Inverse (should not print):\n", inv_singular)
except np.linalg.LinAlgError as e:
    print("\nError calculating inverse:", e)

In practice, especially in machine learning contexts involving potentially non-square or singular matrices (like in linear regression with redundant features), the pseudo-inverse (np.linalg.pinv) is often used as a generalization of the inverse.

Solving Systems of Linear Equations

A common problem in various scientific fields, including machine learning, is solving a system of linear equations. Such a system can be represented in matrix form as:

$Ax = b$

Where $A$ is a known square matrix of coefficients, $x$ is the column vector of unknowns we want to find, and $b$ is a known column vector.

If $A$ is invertible, one way to find $x$ is by multiplying both sides by the inverse of $A$ :

$A^{-1}Ax = A^{-1}b$ $Ix = A^{-1}b$ $x = A^{-1}b$

While you can compute this using np.linalg.inv(A) @ b, it's generally not recommended. Calculating the inverse is computationally more expensive and can be less numerically stable than using a dedicated solver. NumPy provides np.linalg.solve(A, b), which uses more efficient and stable algorithms (often based on LU decomposition) to directly find $x$ .

Consider the system: $x_1 + 2x_2 = 1$ $3x_1 + 4x_2 = -1$

In matrix form:

\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 1 \\ -1 \end{pmatrix}

Let's solve this using NumPy:

# Coefficient matrix A
A = np.array([[1, 2], [3, 4]])

# Constant vector b
b = np.array([1, -1])

print("Matrix A:\n", A)
print("\nVector b:", b)

# Solve using np.linalg.solve
x = np.linalg.solve(A, b)
print("\nSolution vector x using solve():", x) # Expected: [-3, 2]

# Verify the solution: A @ x should be close to b
print("\nVerification A @ x:", A @ x)
print("Is A @ x close to b?", np.allclose(A @ x, b))

# For comparison: solving using explicit inverse (less preferred)
inv_A = np.linalg.inv(A)
x_inv = inv_A @ b
print("\nSolution vector x using inverse:", x_inv)

For well-conditioned, square matrices, both methods give the same result, but np.linalg.solve is the standard and preferred approach.

Other Useful Linear Algebra Functions

The np.linalg module contains many other functions. Here are a couple with particular relevance to machine learning:

Eigenvalues and Eigenvectors (np.linalg.eig(A)): For a square matrix $A$ , an eigenvector $v$ and corresponding eigenvalue $\lambda$ satisfy the equation $Av = \lambda v$ . Eigenvalues and eigenvectors are fundamental to understanding matrix transformations and are used extensively in algorithms like Principal Component Analysis (PCA) for dimensionality reduction, where eigenvectors indicate directions of maximum variance and eigenvalues indicate the magnitude of variance in those directions.
Norms (np.linalg.norm(x, ord=...)): A norm is a measure of the "size" or "length" of a vector or matrix. Different types of norms exist (specified by the ord parameter). Common vector norms include:
- L2 norm (Euclidean norm, default): ord=2 or ord=None. Calculates $\sqrt{\sum_i x_i^2}$ . Used frequently for measuring distance or error.
- L1 norm (Manhattan norm): ord=1. Calculates $\sum_i |x_i|$ . Used in regularization (Lasso) to encourage sparsity. Matrix norms like the Frobenius norm (ord='fro') are also available. Norms are central to regularization techniques in models like Ridge and Lasso regression, evaluating model errors, and distance calculations in algorithms like k-Nearest Neighbors.

# Example: Calculating Norms
v = np.array([3, -4])
print("\nVector v:", v)

# L2 Norm (default)
norm_l2 = np.linalg.norm(v)
print("L2 Norm (Euclidean):", norm_l2) # sqrt(3^2 + (-4)^2) = sqrt(9+16) = sqrt(25) = 5.0

# L1 Norm
norm_l1 = np.linalg.norm(v, ord=1)
print("L1 Norm (Manhattan):", norm_l1) # |3| + |-4| = 3 + 4 = 7.0

# Example: Eigenvalues and Eigenvectors
A = np.array([[4, 2], [1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("\nMatrix A:\n", A)
print("\nEigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

# Verify for the first eigenvalue/vector pair: A @ v = lambda * v
lambda1 = eigenvalues[0]
v1 = eigenvectors[:, 0] # First column is the first eigenvector
print("\nA @ v1:", A @ v1)
print("lambda1 * v1:", lambda1 * v1)
print("Are they close?", np.allclose(A @ v1, lambda1 * v1))

Mastering these NumPy linear algebra operations is essential for implementing and understanding many machine learning algorithms. They provide the computational building blocks for manipulating data representations and model parameters efficiently. Remember that NumPy's implementations are highly optimized, providing significant speed advantages over manual Python loops for these calculations.