Finite difference Laplacian#

This tutorial demonstrates how to assemble the 2D discrete Laplacian on an n x n grid and use it for a sparse matrix-vector product. The example is based on examples/finite_difference_2d.py in the repository.

The 2D Laplacian appears in heat conduction, Poisson solvers, graph diffusion, and image processing. Its stencil connects each interior grid point to its four neighbors with coefficient −1 and a self-coupling of +4 (for a 5-point stencil on a uniform grid). The resulting matrix is an n²xn² sparse symmetric positive-definite matrix with at most 5 non-zeros per row.

Understanding the structure#

For an n x n grid with index function idx(i, j) = i * n + j, the row corresponding to grid point (i, j) has entries:

(idx(i,   j),   idx(i,   j))  <- self: coefficient +4
(idx(i-1, j),   idx(i,   j))  <- north neighbor: coefficient -1
(idx(i+1, j),   idx(i,   j))  <- south neighbor: coefficient -1
(idx(i,   j-1), idx(i,   j))  <- west neighbor: coefficient  -1
(idx(i,   j+1), idx(i,   j))  <- east neighbor: coefficient  -1

Boundary neighbors that fall outside the grid are omitted. Corner points have 2 neighbors, edge points have 3, and interior points have 4. This produces the COO entry list directly.

Assembling with COO#

The COO format is natural for stencil assembly: loop over grid points and push entries into lists.

import mlx.core as mx
import numpy as np
import mlx_sparse as ms


def laplacian_2d(n: int) -> ms.CSRArray:
    rows: list[int] = []
    cols: list[int] = []
    vals: list[float] = []

    def idx(i: int, j: int) -> int:
        return i * n + j

    for i in range(n):
        for j in range(n):
            center = idx(i, j)
            # Self connection.
            rows.append(center)
            cols.append(center)
            vals.append(4.0)
            # Neighbors (only if in bounds).
            for ni, nj in ((i-1, j), (i+1, j), (i, j-1), (i, j+1)):
                if 0 <= ni < n and 0 <= nj < n:
                    rows.append(center)
                    cols.append(idx(ni, nj))
                    vals.append(-1.0)

    data = mx.array(np.asarray(vals, dtype=np.float32))
    row  = mx.array(np.asarray(rows, dtype=np.int32))
    col  = mx.array(np.asarray(cols, dtype=np.int32))

    return ms.coo_array((data, (row, col)), shape=(n*n, n*n)).tocsr(canonical=True)

The stencil produces no duplicate entries (each (i, j) pair appears at most once in the loop), so canonical=True here only sorts the column indices. The nnz count does not change.

Examining the matrix#

ms.use_cpu()   # construction always runs on CPU
L = laplacian_2d(4)   # 16x16 matrix for a 4x4 grid

print(f"shape:  {L.shape}")    # (16, 16)
print(f"nnz:    {L.nnz}")      # 52 = 16*4 minus boundary connections
print(f"density: {L.nnz / (16**2):.3f}")   # ~0.20

mx.eval(L.data)
print("unique values:", set(float(v) for v in np.array(L.data)))
# {4.0, -1.0}

Running on GPU#

Select GPU before construction to keep fixed-shape conversion and product operations on the Metal path.

ms.use_gpu()
L = laplacian_2d(16)              # 256x256 sparse Laplacian

x = mx.array(np.ones(256, dtype=np.float32))
y = L @ x
mx.eval(y)
print(np.array(y)[:8])
# Expected: interior points yield 0 (sum over 4 neighbors each -1, self +4,
# but with all ones as input: 4 - num_neighbors = 0 for interior).

Verifying the result#

The Laplacian applied to a constant vector produces a boundary-indicator vector: interior points give 0 (their 4 neighbors each contribute −1 while the diagonal is +4), while boundary points give a positive value proportional to how many neighbors they are missing.

y_np = np.array(y)
n = 16
for i in range(n):
    for j in range(n):
        k = i * n + j
        num_neighbors = (
            (i > 0) + (i < n-1) + (j > 0) + (j < n-1)
        )
        expected = 4 - num_neighbors   # 0 for interior, 1/2 for boundary
        assert abs(y_np[k] - expected) < 1e-5, f"failed at ({i},{j})"
print("all entries correct!")

Running the example script#

The repository ships the same example as a standalone script:

python examples/finite_difference_2d.py --device gpu
python examples/finite_difference_2d.py --device cpu

The script assembles the 16x16 grid Laplacian and computes L @ ones on the selected MLX device.

Performance notes#

For the 2D Laplacian, each row has exactly 3, 4, or 5 non-zeros, so row lengths are very uniform. This is the best case for the scalar row Metal kernel: load imbalance is minimal and GPU utilization is strong. For a 128x128 grid (16,384x16,384 matrix with ~82K non-zeros), the Metal kernel should comfortably outperform both the CPU kernel and dense MLX matmul.

See Performance guide for guidance on benchmarking this workload.