Graph Laplacians#
This tutorial builds a cycle graph Laplacian using COO assembly, converts it
to CSR, and applies it to a signal on GPU. The example is based on
examples/graph_laplacian.py.
The graph Laplacian of an undirected graph is the matrix L = D - A, where
D is the diagonal degree matrix and A is the adjacency matrix. For a
cycle graph on n nodes (each node connected to its two immediate
neighbours), the Laplacian has:
Diagonal entries: 2 (each node has degree 2).
Off-diagonal entries: -1 at positions
(i, i+1)and(i, i-1), with wrap-around at the boundary.
The result is an n x n matrix with exactly 3n stored values.
Assembling the COO triplets#
Rather than indexing a dense array, assemble the Laplacian directly as COO triplets. Three bands are concatenated in one shot using NumPy.
import mlx.core as mx
import numpy as np
import mlx_sparse as ms
def cycle_graph_laplacian(n: int) -> ms.CSRArray:
nodes = np.arange(n, dtype=np.int32)
# Row indices: same node repeated three times for each band
row = np.concatenate([nodes, nodes, nodes])
# Column indices: self, next node (mod n), previous node (mod n)
col = np.concatenate([
nodes,
(nodes + 1) % n,
(nodes - 1) % n,
]).astype(np.int32)
# Values: +2 on diagonal, -1 on both off-diagonals
data = np.concatenate([
np.full(n, 2.0, dtype=np.float32),
np.full(n, -1.0, dtype=np.float32),
np.full(n, -1.0, dtype=np.float32),
])
return ms.coo_array(
(mx.array(data), (mx.array(row), mx.array(col))),
shape=(n, n),
).tocsr(canonical=True)
The call to tocsr(canonical=True) sorts column indices within each row and
sums any duplicate entries. For this construction there are no duplicates, so
the nnz count remains 3n after canonicalization.
Examining the structure#
ms.use_cpu()
L = cycle_graph_laplacian(6)
print(L)
# CSRArray(shape=(6, 6), nnz=18, dtype=float32, index_dtype=int32,
# sorted_indices=True, has_canonical_format=True)
mx.eval(L.data, L.indices, L.indptr)
print(np.array(L.todense()))
For n=6 the dense matrix should be:
[[ 2, -1, 0, 0, 0, -1],
[-1, 2, -1, 0, 0, 0],
[ 0, -1, 2, -1, 0, 0],
[ 0, 0, -1, 2, -1, 0],
[ 0, 0, 0, -1, 2, -1],
[-1, 0, 0, 0, -1, 2]]
The matrix is symmetric (L == L.T) and each row sums to zero, which is the
characteristic property of a graph Laplacian.
Applying the Laplacian to a signal#
A graph signal is a function defined at the nodes. The Laplacian applied to a smooth signal produces a measure of local variation: for a perfectly smooth (constant) signal, the result is zero everywhere. For a low-frequency signal like a single period of a sine wave, the response is proportional to the signal’s Fourier frequency.
ms.use_gpu()
n = 32
L = cycle_graph_laplacian(n)
# One full period of a sine wave on the cycle graph
signal = mx.sin(2 * mx.pi * mx.arange(n, dtype=mx.float32) / n)
response = L @ signal
mx.eval(response)
print(np.array(response)[:8])
For a discrete-time sine with frequency f on a cycle graph of n nodes,
the Laplacian eigenvalue is 2 - 2*cos(2*pi*f/n). For f=1 (one period),
this is approximately 0.077 for n=32, so every entry of response
should be close to 0.077 * signal.
Verifying the eigenvalue#
expected_eigenvalue = 2.0 - 2.0 * np.cos(2.0 * np.pi / n)
np.testing.assert_allclose(
np.array(response),
expected_eigenvalue * np.array(signal),
rtol=1e-5,
atol=1e-5,
)
print(f"eigenvalue: {expected_eigenvalue:.6f}")
Running the example script#
The repository ships a standalone script that runs this computation:
python examples/graph_laplacian.py --nodes 32 --device gpu
python examples/graph_laplacian.py --nodes 1024 --device gpu
Performance notes#
The cycle Laplacian has exactly 3 non-zeros per row (uniform row lengths). On Metal, the scalar-per-row kernel handles this pattern efficiently because:
Each thread processes a short fixed-length loop of 3 iterations.
The indptr access pattern is sequential.
The column gather into the dense signal vector is non-sequential but has a stride of roughly 1, 2, and
n-1, which maps to cache lines well for smallern.
For large n (above roughly 65,536 nodes), the cache for the signal vector
may start to spill, and the effective bandwidth will fall. Benchmark with
Performance guide guidance.