requires_grad
)backward()
).grad
)torch.nn
torch.nn.Module
Base Classtorch.nn
losses)torch.optim
)torch.utils.data.Dataset
torchvision.transforms
)torch.utils.data.DataLoader
When performing element-wise operations like addition, subtraction, or multiplication between tensors, their shapes often need to align. However, manually reshaping or repeating tensors to match shapes can be cumbersome and inefficient, especially with large datasets. PyTorch addresses this through a mechanism called broadcasting.
Broadcasting provides a set of rules that allow PyTorch to automatically expand tensor dimensions when performing operations, provided their shapes meet certain compatibility criteria. This eliminates the need for explicit dimension expansion in many common cases, leading to cleaner code and better memory usage because the actual data isn't duplicated; only the computation behaves as if it were.
PyTorch determines if two tensors are "broadcastable" by comparing their shapes element-wise, starting from the trailing (rightmost) dimension. Two tensors are compatible for broadcasting if, for each dimension pair (comparing from right to left):
If these conditions hold for all dimension pairs, the tensors are broadcastable. The resulting tensor's shape will have the maximum size along each dimension pair. If the conditions fail for any dimension pair (i.e., dimensions are different and neither is 1), a RuntimeError
is raised.
Let's break down the process:
Let's illustrate with code examples.
Adding a scalar (a 0-dimensional tensor) to any tensor always works via broadcasting. The scalar is effectively expanded to match the tensor's shape.
import torch
# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3], [4, 5, 6]])
# Scalar B: Shape [] (0 dimensions)
b = torch.tensor(10)
# Add scalar to tensor
c = a + b
print(f"Shape of a: {a.shape}")
# Shape of a: torch.Size([2, 3])
print(f"Shape of b: {b.shape}")
# Shape of b: torch.Size([])
print(f"Shape of c: {c.shape}")
# Shape of c: torch.Size([2, 3])
print(f"Result c:\n{c}")
# Result c:
# tensor([[11, 12, 13],
# [14, 15, 16]])
Here, b
(shape []
) is broadcast to shape [2, 3]
to match a
.
Consider adding a row vector (shape [3]
) to a matrix (shape [2, 3]
).
# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3],
[4, 5, 6]])
# Tensor B: Shape [3] (can be seen as [1, 3] for broadcasting)
b = torch.tensor([10, 20, 30])
# Add row vector to matrix
c = a + b
print(f"Shape of a: {a.shape}") # torch.Size([2, 3])
print(f"Shape of b: {b.shape}") # torch.Size([3])
print(f"Shape of c: {c.shape}") # torch.Size([2, 3])
print(f"Result c:\n{c}")
# Result c:
# tensor([[11, 22, 33],
# [14, 25, 36]])
a
has shape [2, 3]
. b
has shape [3]
. Aligning right gives:
Tensor A: 2 x 3
Tensor B: 3
3
equals 3
. Compatible. Result dimension size is 3
.a
has 2
, b
has no dimension here (implicitly size 1
). Compatible. Result dimension size is 2
.[2, 3]
.b
is treated as shape [1, 3]
and its single row is copied along the first dimension to match a
's shape [2, 3]
.Now, let's add a column vector (shape [2, 1]
) to the same matrix (shape [2, 3]
).
# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3],
[4, 5, 6]])
# Tensor B: Shape [2, 1]
b = torch.tensor([[10], [20]])
# Add column vector to matrix
c = a + b
print(f"Shape of a: {a.shape}") # torch.Size([2, 3])
print(f"Shape of b: {b.shape}") # torch.Size([2, 1])
print(f"Shape of c: {c.shape}") # torch.Size([2, 3])
print(f"Result c:\n{c}")
# Result c:
# tensor([[11, 12, 13],
# [24, 25, 26]])
Tensor A: 2 x 3
Tensor B: 2 x 1
a
has 3
, b
has 1
. Compatible (one is 1). Result dimension size is 3
.a
has 2
, b
has 2
. Compatible (equal). Result dimension size is 2
.[2, 3]
.b
's dimension of size 1
(the column dimension) is expanded by copying values across columns to match a
's shape [2, 3]
.Let's visualize broadcasting A
(shape [3, 1]
) and B
(shape [4]
).
Illustration of broadcasting addition for Tensor A (shape [3, 1]) and Tensor B (shape [4]). Tensor A's second dimension (size 1) expands to 4. Tensor B gains a leading dimension of size 1 (becoming shape [1, 4]), which then expands to size 3. Both effectively become shape [3, 4] for the element-wise addition.
Broadcasting fails if the non-matching dimensions are not 1.
# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3], [4, 5, 6]])
# Tensor B: Shape [2]
b = torch.tensor([10, 20])
try:
c = a + b
except RuntimeError as e:
print(f"Error: {e}")
# Error: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1
Tensor A: 2 x 3
Tensor B: 2
a
has 3
, b
has 2
. Neither is 1. Incompatible. The operation fails.Broadcasting is frequently used in neural networks:
[output_features]
) to the output of a linear layer (shape [batch_size, output_features]
).Understanding broadcasting is important for writing concise and efficient PyTorch code. It allows you to perform operations on tensors of different shapes naturally, provided they adhere to the compatibility rules, simplifying many common data manipulation and modeling tasks.
© 2025 ApX Machine Learning