Recall from classical machine learning, particularly algorithms like Support Vector Machines (SVMs), the power of the kernel trick. It allows us to implicitly operate in a high-dimensional feature space without ever needing to compute the coordinates of our data in that space explicitly. Instead, we only need a function, the kernel k(x,x′), that calculates the inner product between the mapped data points ϕ(x) and ϕ(x′) in that feature space.
Quantum machine learning leverages a similar concept, building upon the quantum feature maps introduced previously. A quantum feature map Uϕ(x) encodes a classical data point x into a quantum state ∣ϕ(x)⟩=Uϕ(x)∣0...0⟩ residing in a Hilbert space H. This Hilbert space can be exponentially large in the number of qubits, serving as our potentially very high-dimensional feature space.
The quantum kernel k(x,x′) is defined as a function of the inner product between two quantum feature states, ∣ϕ(x)⟩ and ∣ϕ(x′)⟩. A common and practically useful definition is:
k(x,x′)=∣⟨ϕ(x)∣ϕ(x′)⟩∣2Why the squared magnitude? As we'll see, this specific form relates directly to a probability that can be estimated by measuring a quantum circuit, making it suitable for near-term quantum hardware. Other functions of the inner product are possible, but this form is prevalent.
Substituting the definition of the feature states using the unitary Uϕ:
k(x,x′)=∣⟨0...0∣Uϕ(x)†Uϕ(x′)∣0...0⟩∣2This equation reveals the essence of the quantum kernel trick:
The core idea is to prepare the states ∣ϕ(x)⟩ and ∣ϕ(x′)⟩ (or states related to them) within a quantum circuit and then perform operations and measurements that reveal their overlap ⟨ϕ(x)∣ϕ(x′)⟩.
Consider the unitary transformation V=Uϕ(x)†Uϕ(x′). The inner product we need is the expectation value of this unitary in the initial state ∣0...0⟩:
⟨ϕ(x)∣ϕ(x′)⟩=⟨0...0∣V∣0...0⟩Circuits can be designed to estimate quantities like ∣⟨0...0∣V∣0...0⟩∣2. For example, the "swap test" circuit or related interference-based circuits measure the fidelity between two states, which corresponds to the squared magnitude of their inner product. We apply Uϕ(x′) to one register initialized to ∣0...0⟩, apply Uϕ(x) to another register initialized to ∣0...0⟩, and then use auxiliary qubits and controlled operations to interfere these states. Measuring the auxiliary qubit yields information about ∣⟨ϕ(x)∣ϕ(x′)⟩∣2.
Flow of the quantum kernel trick. Classical data points x and x′ are implicitly mapped to quantum states ∣ϕ(x)⟩ and ∣ϕ(x′)⟩ via feature map unitaries Uϕ. A quantum circuit directly estimates their overlap, yielding the kernel value k(x,x′), bypassing explicit representation in the high-dimensional Hilbert space H.
The beauty of this formalism lies in its compatibility with classical kernel machines. Once we can compute the quantum kernel matrix K, where Kij=k(xi,xj) for a dataset {x1,...,xN}, we can feed this matrix directly into classical algorithms like:
The classical algorithm performs its optimization or analysis solely based on the kernel matrix K, effectively operating in the quantum feature space H without needing direct access to it. The hypothesis is that quantum feature maps Uϕ(x) might create correlations or structures in H that are hard to achieve with classical feature maps, potentially leading to better performance on certain datasets.
For k(x,x′) to be a valid kernel for many classical algorithms (like SVM), the resulting kernel matrix K must be positive semi-definite (PSD). This means for any vector c, the quadratic form cTKc≥0.
Fortunately, the commonly used quantum kernel definition k(x,x′)=∣⟨ϕ(x)∣ϕ(x′)⟩∣2 generally leads to a PSD kernel matrix. Let G be the Gram matrix with entries Gij=⟨ϕ(xi)∣ϕ(xj)⟩. The Gram matrix is always PSD. The quantum kernel matrix K we defined has entries Kij=∣Gij∣2. This matrix K is the Hadamard product (element-wise product) of G and its complex conjugate G∗. According to the Schur product theorem, the Hadamard product of two PSD matrices is also PSD. Since both G and G∗ are PSD, their Hadamard product K is also PSD.
This ensures that we can readily use these quantum kernels within standard kernel method frameworks. The next section will detail the practical methods for actually calculating these kernel matrix entries using quantum circuits and simulators.
© 2025 ApX Machine Learning