At the heart of deep learning algorithms are numerical computations, primarily involving arrays (or tensors) and operations from linear algebra. Julia excels in this domain, offering a syntax that is both high-level and expressive, while delivering performance comparable to low-level languages. This section will reinforce your understanding of Julia's capabilities for array manipulation and linear algebra, which are essential for building and training neural networks.
In Julia, arrays are multi-dimensional containers that can hold a collection of items of the same type. For deep learning, these arrays will typically store numbers, such as input features, model weights, or gradients.
Julia provides several ways to create arrays. You're likely familiar with basic array literals:
# A Vector (1D array) of Float64
vector_a = [1.0, 2.5, 3.0]
# A Matrix (2D array) of Int64
matrix_b = [1 2 3; 4 5 6]
# A 3D array
tensor_c = rand(2, 3, 4) # Creates a 2x3x4 array with random Float64 values between 0 and 1
Commonly used functions for array initialization include:
zeros(T, dims...)
or zeros(dims...)
: Creates an array filled with zeros. T
specifies the element type (defaults to Float64
).
zeros_matrix = zeros(Int8, 2, 3) # A 2x3 matrix of 8-bit integers, all zero
# Output:
# 2×3 Matrix{Int8}:
# 0 0 0
# 0 0 0
ones(T, dims...)
or ones(dims...)
: Creates an array filled with ones.
ones_vector = ones(3) # A 3-element vector of Float64, all one
# Output:
# 3-element Vector{Float64}:
# 1.0
# 1.0
# 1.0
fill(value, dims...)
: Creates an array filled with a specific value
.
filled_array = fill(7.7, (2, 2)) # A 2x2 matrix, all elements are 7.7
# Output:
# 2×2 Matrix{Float64}:
# 7.7 7.7
# 7.7 7.7
rand(T, dims...)
or rand(dims...)
: Creates an array with random numbers (uniformly distributed between 0 and 1 by default).randn(T, dims...)
or randn(dims...)
: Creates an array with random numbers from a standard normal distribution.You can inspect array characteristics using functions like:
size(A)
: Returns a tuple containing the dimensions of array A
.length(A)
: Returns the total number of elements in A
.ndims(A)
: Returns the number of dimensions of A
.my_matrix = rand(4, 5)
println("Size: ", size(my_matrix)) # Output: Size: (4, 5)
println("Length: ", length(my_matrix)) # Output: Length: 20
println("Dimensions: ", ndims(my_matrix)) # Output: Dimensions: 2
Accessing and modifying elements or sub-arrays is done through indexing. Julia uses 1-based indexing.
data_matrix = [10 20 30; 40 50 60; 70 80 90]
# 3×3 Matrix{Int64}:
# 10 20 30
# 40 50 60
# 70 80 90
first_element = data_matrix[1, 1] # 10
second_row_third_col = data_matrix[2, 3] # 60
# Slicing
first_row = data_matrix[1, :] # [10, 20, 30] (returns a Vector)
second_column = data_matrix[:, 2] # [20, 50, 80] (returns a Vector)
sub_matrix = data_matrix[1:2, 2:3] # [20 30; 50 60] (returns a 2x2 Matrix)
# Using `end` to refer to the last index
last_element_first_row = data_matrix[1, end] # 30
A significant feature in Julia for numerical work is broadcasting. It allows you to apply functions element-wise to arrays as if they were scalars, or combine arrays of different shapes in a compatible way. This is invoked by placing a dot .
before an operator or after a function name.
A = [1 2; 3 4]
B = [10 20; 30 40]
# Element-wise addition
C = A .+ B
# Output:
# 2×2 Matrix{Int64}:
# 11 22
# 33 44
# Element-wise multiplication (Hadamard product)
D = A .* B
# Output:
# 2×2 Matrix{Int64}:
# 10 40
# 90 160
# Scalar addition broadcasted to all elements
E = A .+ 5
# Output:
# 2×2 Matrix{Int64}:
# 6 7
# 8 9
# Applying a function element-wise
F = sin.(A) # Calculates sine of each element in A
Broadcasting is not just syntactic sugar; it's implemented efficiently, often fusing operations to reduce temporary allocations and improve performance. This is particularly useful in deep learning for operations like adding a bias vector to a matrix of activations.
activations = rand(3, 4) # A 3x4 matrix (e.g., 3 neurons, 4 samples)
bias_vector = [0.1, 0.2, 0.3] # A 3-element vector (bias for each neuron)
# Add bias_vector to each column of activations
biased_activations = activations .+ bias_vector
# `bias_vector` is treated as a 3x1 column vector and added to each column of `activations`
println(size(biased_activations)) # Output: (3, 4)
Linear algebra is the mathematical foundation upon which most deep learning algorithms are built. Operations like matrix multiplication, vector dot products, and transpositions are ubiquitous. Julia's LinearAlgebra
standard library provides a comprehensive suite of tools for these tasks, often leveraging highly optimized backend libraries like BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package).
To use these functions, you typically start with:
using LinearAlgebra
Dot Product: The dot product of two vectors u and v is uTv=∑iuivi.
u = [1.0, 2.0, 3.0]
v = [4.0, 5.0, 6.0]
dot_product_uv = dot(u, v) # 1.0*4.0 + 2.0*5.0 + 3.0*6.0 = 4.0 + 10.0 + 18.0 = 32.0
# Alternatively, use the unicode symbol ⋅ (typed as \cdot<TAB>)
dot_product_uv_alt = u ⋅ v # 32.0
Matrix Multiplication: If A is an m×n matrix and B is an n×p matrix, their product C=AB is an m×p matrix.
M1 = [1 2; 3 4] # 2x2 matrix
M2 = [5 6 7; 8 9 10] # 2x3 matrix
P = M1 * M2 # Results in a 2x3 matrix
# Output:
# 2×3 Matrix{Int64}:
# 21 24 27
# 47 54 61
In deep learning, matrix multiplication is fundamental. For instance, in a dense layer, inputs are multiplied by a weight matrix.
Transpose: The transpose AT of a matrix A switches its rows and columns.
A = [1 2 3; 4 5 6]
A_transpose = transpose(A) # or A'
# Output (A_transpose):
# 3×2 transpose(::Matrix{Int64}) with eltype Int64:
# 1 4
# 2 5
# 3 6
Note that A'
creates a lazy Transpose
wrapper. To get a new dense matrix, you can use collect(A')
or copy(transpose(A))
.
Identity Matrix: An identity matrix I is a square matrix with ones on the main diagonal and zeros elsewhere.
I3 = I(3) # Creates a 3x3 identity matrix (UniformScaling object)
# To get a dense Matrix:
dense_I3 = Matrix(I3)
# Output:
# 3×3 Matrix{Bool} (promotes to other types in operations):
# 1 0 0
# 0 1 0
# 0 0 1
Matrix Inverse: For a square matrix A, its inverse A−1 is such that AA−1=A−1A=I.
S = [3.0 1.0; 1.0 2.0]
S_inv = inv(S)
# Output:
# 2×2 Matrix{Float64}:
# 0.4 -0.2
# -0.2 0.6
# Verify: S * S_inv should be close to identity
println(S * S_inv)
# Output:
# 2×2 Matrix{Float64}:
# 1.0 5.55112e-17
# 5.55112e-17 1.0
While inv()
is available, for solving linear systems like Ax=b, it's generally more numerically stable and efficient to use the backslash operator: x = A \ b
.
Solving Linear Systems: To solve Ax=b for x:
A_sys = [2.0 1.0; 1.0 3.0]
b_sys = [1.0, 2.0]
x_sol = A_sys \ b_sys
# Output:
# 2-element Vector{Float64}:
# 0.2
# 0.6
# Verify: A_sys * x_sol should be close to b_sys
println(A_sys * x_sol)
# Output:
# [1.0, 2.0]
These array and linear algebra operations are not just abstract mathematical tools; they are the workhorses of deep learning. Consider a single neuron's output or the output of a dense layer in a neural network, often expressed as y=f(Wx+b), where:
The following diagram illustrates the flow for a typical layer computation:
This diagram shows how input features (X) are transformed by weights (W) and biases (b) through matrix multiplication and addition, followed by an activation function, to produce the layer's output (Y).
Data itself, whether it's a simple set of features, an image, or a sequence of text, is represented as arrays (often called tensors in the context of deep learning). A grayscale image might be a 2D array (height x width), a color image a 3D array (height x width x channels), and a mini-batch of color images a 4D array (height x width x channels x batch size).
Mastering array manipulation and linear algebra in Julia is therefore a direct prerequisite for effectively implementing, understanding, and debugging deep learning models. As you move into working with Flux.jl, you'll see these operations appear constantly, though sometimes abstracted within higher-level layer definitions.
Was this section helpful?
© 2025 ApX Machine Learning