Accelerate direct solvers: square systems#

This notebook shows the optional Apple Accelerate direct-solver path for square sparse systems. Accelerate support is available in published macOS wheels. If you are using an editable/source build, see the installation guide for the required CMAKE_ARGS feature gate.

Accelerate-backed solves use opaque Apple factorization objects. This is why the API lives behind linalg.factorized and linalg.spsolve rather than sparse_lu or sparse_cholesky: the explicit-factor APIs promise mlx-sparse CSRArray factors, while Accelerate does not expose factors in that form.

import mlx.core as mx
import numpy as np
import scipy.sparse

import mlx_sparse as ms
from mlx_sparse import linalg

INSTALLATION = "https://mlx-sparse.readthedocs.io/en/latest/installation.html"

print("Accelerate status:", ms.capabilities.status("accelerate"))
print("Accelerate available:", ms.capabilities.ACCELERATE)
print("Reason:", ms.capabilities.reason("accelerate"))

if not ms.capabilities.ACCELERATE:
    raise RuntimeError(
        "This notebook requires an Accelerate-enabled build. "
        f"For editable/source installs, follow {INSTALLATION}."
    )
Accelerate status: available
Accelerate available: True
Reason: Accelerate support is compiled and running on Darwin.

A sparse SPD test system#

The matrix below is a tridiagonal SPD operator. It is small enough to inspect but still exercises the same public API as larger systems. We keep CSR, CSC, and COO views because Accelerate accepts all three public sparse formats after validation and canonical CSC normalization.

n = 8
main = np.full(n, 4.0, dtype=np.float32)
off = np.full(n - 1, -1.0, dtype=np.float32)
A_sp = scipy.sparse.diags([off, main, off], [-1, 0, 1], format="csr")

A_csr = ms.from_scipy(A_sp, format="csr")
A_csc = ms.from_scipy(A_sp, format="csc")
A_coo = ms.from_scipy(A_sp, format="coo")

b_np = np.linspace(1.0, 2.4, n, dtype=np.float32)
b = mx.array(b_np)

B_np = np.stack(
    [b_np, b_np[::-1], np.ones(n, dtype=np.float32)],
    axis=1,
)
B = mx.array(B_np)

print(f"shape={A_csr.shape}, nnz={A_csr.nnz}")
print("formats:", type(A_csr).__name__, type(A_csc).__name__, type(A_coo).__name__)
shape=(8, 8), nnz=22
formats: CSRArray CSCArray COOArray

Reusable Cholesky solve#

For SPD systems, method="cholesky" is the most specific direct solver. It factorizes once and can then solve either a vector right-hand side or a matrix of right-hand sides. That reuse is the main reason to prefer factorized over repeated calls to spsolve.

chol = linalg.factorized(A_csr, method="cholesky")
x_chol = chol.solve(b)
X_chol = chol.solve(B)
mx.eval(x_chol, X_chol)

x_ref = np.linalg.solve(A_sp.toarray(), b_np)
X_ref = np.linalg.solve(A_sp.toarray(), B_np)
rel_x = np.linalg.norm(np.array(x_chol) - x_ref) / np.linalg.norm(x_ref)
rel_X = np.linalg.norm(np.array(X_chol) - X_ref) / np.linalg.norm(X_ref)
rel_res = np.linalg.norm(A_sp @ np.array(x_chol) - b_np) / np.linalg.norm(b_np)

print(
    f"metadata: backend={chol.backend}, method={chol.method}, "
    f"rhs_size={chol.rhs_size}, solution_size={chol.solution_size}"
)
print(f"single rhs rel_error={rel_x:.2e}, rel_residual={rel_res:.2e}")
print(f"matrix rhs rel_error={rel_X:.2e}, solution shape={X_chol.shape}")
print("first four x:", np.round(np.array(x_chol)[:4], 6))
metadata: backend=accelerate, method=cholesky, rhs_size=8, solution_size=8
single rhs rel_error=7.93e-08, rel_residual=1.71e-07
matrix rhs rel_error=9.09e-08, solution shape=(8, 3)
first four x: [0.392788 0.571153 0.691824 0.796143]

LU and spsolve#

method="lu" handles general nonsingular square systems when Accelerate LU is available in the Apple SDK and runtime. spsolve(A, b) is the one-shot convenience wrapper, it is useful when you do not need to reuse the factorization.

lu = linalg.factorized(A_csc, method="lu")
x_lu = lu(b)
x_direct = linalg.spsolve(A_csr, b)
mx.eval(x_lu, x_direct)

rel_lu = np.linalg.norm(np.array(x_lu) - x_ref) / np.linalg.norm(x_ref)
rel_direct = np.linalg.norm(np.array(x_direct) - x_ref) / np.linalg.norm(x_ref)

print(
    f"metadata: backend={lu.backend}, method={lu.method}, "
    f"rhs_size={lu.rhs_size}, solution_size={lu.solution_size}"
)
print(f"lu rel_error={rel_lu:.2e}")
print(f"spsolve rel_error={rel_direct:.2e}")
print("spsolve first four:", np.round(np.array(x_direct)[:4], 6))
metadata: backend=accelerate, method=lu, rhs_size=8, solution_size=8
lu rel_error=6.48e-08
spsolve rel_error=6.48e-08
spsolve first four: [0.392788 0.571153 0.691824 0.796143]

Format coverage#

For square systems, method="auto" chooses LU. The public input may be CSR, CSC, or COO. Internally, the Accelerate adapter validates dtype, shape, index bounds, canonicalization, and overflow limits before constructing the Apple CSC object.

for name, matrix in [("CSR", A_csr), ("CSC", A_csc), ("COO", A_coo)]:
    solver = linalg.factorized(matrix, method="auto")
    y = solver.solve(b)
    mx.eval(y)
    rel = np.linalg.norm(np.array(y) - x_ref) / np.linalg.norm(x_ref)
    print(
        f"{name}: backend={solver.backend:10s} "
        f"method={solver.method:9s} rel_error={rel:.2e}"
    )
CSR: backend=accelerate method=lu        rel_error=6.48e-08
CSC: backend=accelerate method=lu        rel_error=6.48e-08
COO: backend=accelerate method=lu        rel_error=6.48e-08

Practical rule#

Use factorized when the same matrix is solved repeatedly. Use spsolve when you need one direct solve and do not want to keep a solver object. Use the explicit native sparse_lu or sparse_cholesky APIs only when you need inspectable sparse factors.