All Courses

Understanding Broadcasting

When performing element-wise operations like addition, subtraction, or multiplication between tensors, their shapes often need to align. However, manually reshaping or repeating tensors to match shapes can be cumbersome and inefficient, especially with large datasets. PyTorch addresses this through a mechanism called broadcasting.

Broadcasting provides a set of rules that allow PyTorch to automatically expand tensor dimensions when performing operations, provided their shapes meet certain compatibility criteria. This eliminates the need for explicit dimension expansion in many common cases, leading to cleaner code and better memory usage because the actual data isn't duplicated; only the computation behaves as if it were.

Broadcasting Rules

PyTorch determines if two tensors are "broadcastable" by comparing their shapes element-wise, starting from the trailing (rightmost) dimension. Two tensors are compatible for broadcasting if, for each dimension pair (comparing from right to left):

Equal Dimensions: The dimensions are equal.
One Dimension is 1: One of the dimensions is 1.
Missing Dimension: One of the tensors does not have the dimension (considered size 1 for this comparison).

If these conditions hold for all dimension pairs, the tensors are broadcastable. The resulting tensor's shape will have the maximum size along each dimension pair. If the conditions fail for any dimension pair (i.e., dimensions are different and neither is 1), a RuntimeError is raised.

Let's break down the process:

Align Shapes: The tensors are aligned by their trailing dimensions. If one tensor has fewer dimensions than the other, it's treated as having dimensions of size 1 prepended to its shape for alignment purposes.
Check Compatibility & Determine Result Shape: Starting from the rightmost dimension, compare the sizes:
- If dimensions are equal, the result dimension size is that same size.
- If one dimension is 1, the result dimension size is the size of the other (larger) dimension.
- If one tensor lacks a dimension (due to alignment), the result dimension size is the size of the dimension from the other tensor.
Perform Operation: The operation proceeds as if the tensor with size 1 along a given dimension had its values copied along that dimension to match the size of the corresponding dimension in the other tensor.

Broadcasting Examples

Let's illustrate with code examples.

Scalar and Tensor

Adding a scalar (a 0-dimensional tensor) to any tensor always works via broadcasting. The scalar is effectively expanded to match the tensor's shape.

import torch

# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3], [4, 5, 6]])
# Scalar B: Shape [] (0 dimensions)
b = torch.tensor(10)

# Add scalar to tensor
c = a + b

print(f"Shape of a: {a.shape}")
# Shape of a: torch.Size([2, 3])
print(f"Shape of b: {b.shape}")
# Shape of b: torch.Size([])
print(f"Shape of c: {c.shape}")
# Shape of c: torch.Size([2, 3])
print(f"Result c:\n{c}")
# Result c:
# tensor([[11, 12, 13],
#         [14, 15, 16]])

Here, b (shape []) is broadcast to shape [2, 3] to match a.

Row Vector and Matrix

Consider adding a row vector (shape [3]) to a matrix (shape [2, 3]).

# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
# Tensor B: Shape [3] (can be seen as [1, 3] for broadcasting)
b = torch.tensor([10, 20, 30])

# Add row vector to matrix
c = a + b

print(f"Shape of a: {a.shape}") # torch.Size([2, 3])
print(f"Shape of b: {b.shape}") # torch.Size([3])
print(f"Shape of c: {c.shape}") # torch.Size([2, 3])
print(f"Result c:\n{c}")
# Result c:
# tensor([[11, 22, 33],
#         [14, 25, 36]])

Alignment: a has shape [2, 3]. b has shape [3]. Aligning right gives:
```
  Tensor A:   2 x 3
  Tensor B:       3
```
Compatibility Check:
- Trailing dimension: 3 equals 3. Compatible. Result dimension size is 3.
- Next dimension: a has 2, b has no dimension here (implicitly size 1). Compatible. Result dimension size is 2.
Result Shape: [2, 3].
Expansion: Tensor b is treated as shape [1, 3] and its single row is copied along the first dimension to match a's shape [2, 3].

Column Vector and Matrix

Now, let's add a column vector (shape [2, 1]) to the same matrix (shape [2, 3]).

# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
# Tensor B: Shape [2, 1]
b = torch.tensor([[10], [20]])

# Add column vector to matrix
c = a + b

print(f"Shape of a: {a.shape}") # torch.Size([2, 3])
print(f"Shape of b: {b.shape}") # torch.Size([2, 1])
print(f"Shape of c: {c.shape}") # torch.Size([2, 3])
print(f"Result c:\n{c}")
# Result c:
# tensor([[11, 12, 13],
#         [24, 25, 26]])

Alignment:

  Tensor A:   2 x 3
  Tensor B:   2 x 1

Compatibility Check:
- Trailing dimension: a has 3, b has 1. Compatible (one is 1). Result dimension size is 3.
- Next dimension: a has 2, b has 2. Compatible (equal). Result dimension size is 2.
Result Shape: [2, 3].
Expansion: Tensor b's dimension of size 1 (the column dimension) is expanded by copying values across columns to match a's shape [2, 3].

A Visual Example

Let's visualize broadcasting A (shape [3, 1]) and B (shape [4]).

Illustration of broadcasting addition for Tensor A (shape [3, 1]) and Tensor B (shape [4]). Tensor A's second dimension (size 1) expands to 4. Tensor B gains a leading dimension of size 1 (becoming shape [1, 4]), which then expands to size 3. Both effectively become shape [3, 4] for the element-wise addition.

Incompatible Shapes

Broadcasting fails if the non-matching dimensions are not 1.

# Tensor A: Shape [2, 3]
a = torch.tensor([[1, 2, 3], [4, 5, 6]])
# Tensor B: Shape [2]
b = torch.tensor([10, 20])

try:
    c = a + b
except RuntimeError as e:
    print(f"Error: {e}")
# Error: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

Alignment:

  Tensor A:   2 x 3
  Tensor B:       2

Compatibility Check:
- Trailing dimension: a has 3, b has 2. Neither is 1. Incompatible. The operation fails.

Common Uses

Broadcasting is frequently used in neural networks:

Adding Bias: Adding a bias vector (shape [output_features]) to the output of a linear layer (shape [batch_size, output_features]).
Normalization: Subtracting the mean (scalar or per-feature vector) and dividing by the standard deviation (scalar or per-feature vector) across a batch of data.
Applying Masks: Element-wise multiplying data with a boolean mask that might have fewer dimensions.

Understanding broadcasting is important for writing concise and efficient PyTorch code. It allows you to perform operations on tensors of different shapes naturally, provided they adhere to the compatibility rules, simplifying many common data manipulation and modeling tasks.

Was this section helpful?