While matrix addition, subtraction, and scalar multiplication operate element by element, matrix multiplication works quite differently. It's a fundamental operation that combines two matrices based on a rule involving rows and columns, not just corresponding entries. This operation is central to many concepts in machine learning, such as transforming data points, chaining computational steps in neural networks, and expressing systems of linear equations.
Before you can multiply two matrices, say A and B, they must satisfy a specific condition regarding their dimensions. If matrix A has dimensions m×n (meaning m rows and n columns) and matrix B has dimensions n×p (n rows and p columns), then the matrix product AB is defined.
The critical part is that the number of columns in the first matrix (A) must equal the number of rows in the second matrix (B). In this case, both are n.
The resulting matrix, let's call it C=AB, will have dimensions m×p. It will have the same number of rows as the first matrix (A) and the same number of columns as the second matrix (B).
m×nA×n×pB=m×pC
If the inner dimensions don't match (n=n), the matrices cannot be multiplied in that order.
How do we find the values inside the resulting matrix C? Each element Cij (the element in the i-th row and j-th column of C) is calculated by taking the dot product of the i-th row of matrix A and the j-th column of matrix B.
Remember the dot product of two vectors u=[u1,u2,…,un] and v=[v1,v2,…,vn] is u⋅v=u1v1+u2v2+⋯+unvn=∑k=1nukvk.
For matrix multiplication C=AB, where A is m×n and B is n×p:
Cij=(Row i of A)⋅(Column j of B)
Mathematically, if Aik is the element in the i-th row and k-th column of A, and Bkj is the element in the k-th row and j-th column of B, then:
Cij=∑k=1nAikBkj
This means you multiply corresponding elements from the row of A and the column of B and then sum up those products.
A visual representation of how Row i of matrix A and Column j of matrix B combine via the dot product to compute the element Cij in the resulting matrix C.
Let's multiply a 2×3 matrix A by a 3×2 matrix B. The result C should be a 2×2 matrix.
A=[142536]B=792813The resulting matrix is C=AB:
C=[C11C21C12C22]Let's calculate each element:
So, the resulting matrix is:
C=AB=[31851955]NumPy makes matrix multiplication straightforward. The standard way to perform matrix multiplication between two NumPy arrays (representing matrices) since Python 3.5 is using the @
operator.
Let's perform the same calculation as above using NumPy:
import numpy as np
# Define matrices A and B
A = np.array([[1, 2, 3],
[4, 5, 6]])
B = np.array([[7, 8],
[9, 1],
[2, 3]])
# Check shapes
print(f"Shape of A: {A.shape}") # Output: Shape of A: (2, 3)
print(f"Shape of B: {B.shape}") # Output: Shape of B: (3, 2)
# Perform matrix multiplication using the @ operator
C = A @ B
print(f"\nMatrix A:\n{A}")
print(f"Matrix B:\n{B}")
print(f"Result C = A @ B:\n{C}")
# Output:
# Result C = A @ B:
# [[31 19]
# [85 55]]
print(f"Shape of C: {C.shape}") # Output: Shape of C: (2, 2)
The result matches our manual calculation. Notice how NumPy handles the row-by-column dot products internally.
You might also encounter np.dot(A, B)
or A.dot(B)
. For 2D arrays (matrices), these functions perform standard matrix multiplication, just like the @
operator. However, the @
operator is generally preferred for matrix multiplication because it's unambiguous, whereas np.dot
behaves differently for arrays with more than two dimensions. For clarity when working with matrices, stick with @
.
Unlike multiplication with regular numbers (scalars), where a×b=b×a, matrix multiplication is generally not commutative. This means that, in most cases:
AB=BA
Sometimes, BA might not even be defined even if AB is. For instance, in our example above, A is 2×3 and B is 3×2. The product AB is defined and results in a 2×2 matrix.
What about BA? Here, B is 3×2 and A is 2×3. The inner dimensions match (2 and 2), so BA is defined. The resulting matrix BA will be 3×3.
Since AB is 2×2 and BA is 3×3, they clearly cannot be equal. Let's compute BA with NumPy to see:
# Calculate BA (note the order)
C_BA = B @ A
print(f"\nResult BA = B @ A:\n{C_BA}")
# Output:
# Result BA = B @ A:
# [[ 39 54 69]
# [ 13 23 33]
# [ 14 19 24]]
print(f"Shape of BA: {C_BA.shape}") # Output: Shape of BA: (3, 3)
As expected, BA is a 3×3 matrix and is completely different from AB.
Even if A and B are square matrices of the same size, where both AB and BA are defined and have the same dimensions, the results will usually be different. The order in which you multiply matrices matters significantly. This has important implications in areas like computer graphics and machine learning, where sequences of matrix operations represent sequences of transformations or computational steps. Changing the order changes the outcome.
© 2025 ApX Machine Learning