Accelerate direct solvers: square systems#
This notebook shows the optional Apple Accelerate direct-solver path for
square sparse systems. Accelerate support is available in published macOS
wheels. If you are using an editable/source build, see the installation guide
for the required CMAKE_ARGS feature gate.
Accelerate-backed solves use opaque Apple factorization objects. This is why
the API lives behind linalg.factorized and linalg.spsolve rather than
sparse_lu or sparse_cholesky: the explicit-factor APIs promise
mlx-sparse CSRArray factors, while Accelerate does not expose factors in
that form.
import mlx.core as mx
import numpy as np
import scipy.sparse
import mlx_sparse as ms
from mlx_sparse import linalg
INSTALLATION = "https://mlx-sparse.readthedocs.io/en/latest/installation.html"
print("Accelerate status:", ms.capabilities.status("accelerate"))
print("Accelerate available:", ms.capabilities.ACCELERATE)
print("Reason:", ms.capabilities.reason("accelerate"))
if not ms.capabilities.ACCELERATE:
raise RuntimeError(
"This notebook requires an Accelerate-enabled build. "
f"For editable/source installs, follow {INSTALLATION}."
)
Accelerate status: available
Accelerate available: True
Reason: Accelerate support is compiled and running on Darwin.
A sparse SPD test system#
The matrix below is a tridiagonal SPD operator. It is small enough to inspect but still exercises the same public API as larger systems. We keep CSR, CSC, and COO views because Accelerate accepts all three public sparse formats after validation and canonical CSC normalization.
n = 8
main = np.full(n, 4.0, dtype=np.float32)
off = np.full(n - 1, -1.0, dtype=np.float32)
A_sp = scipy.sparse.diags([off, main, off], [-1, 0, 1], format="csr")
A_csr = ms.from_scipy(A_sp, format="csr")
A_csc = ms.from_scipy(A_sp, format="csc")
A_coo = ms.from_scipy(A_sp, format="coo")
b_np = np.linspace(1.0, 2.4, n, dtype=np.float32)
b = mx.array(b_np)
B_np = np.stack(
[b_np, b_np[::-1], np.ones(n, dtype=np.float32)],
axis=1,
)
B = mx.array(B_np)
print(f"shape={A_csr.shape}, nnz={A_csr.nnz}")
print("formats:", type(A_csr).__name__, type(A_csc).__name__, type(A_coo).__name__)
shape=(8, 8), nnz=22
formats: CSRArray CSCArray COOArray
Reusable Cholesky solve#
For SPD systems, method="cholesky" is the most specific direct solver. It
factorizes once and can then solve either a vector right-hand side or a matrix
of right-hand sides. That reuse is the main reason to prefer factorized
over repeated calls to spsolve.
chol = linalg.factorized(A_csr, method="cholesky")
x_chol = chol.solve(b)
X_chol = chol.solve(B)
mx.eval(x_chol, X_chol)
x_ref = np.linalg.solve(A_sp.toarray(), b_np)
X_ref = np.linalg.solve(A_sp.toarray(), B_np)
rel_x = np.linalg.norm(np.array(x_chol) - x_ref) / np.linalg.norm(x_ref)
rel_X = np.linalg.norm(np.array(X_chol) - X_ref) / np.linalg.norm(X_ref)
rel_res = np.linalg.norm(A_sp @ np.array(x_chol) - b_np) / np.linalg.norm(b_np)
print(
f"metadata: backend={chol.backend}, method={chol.method}, "
f"rhs_size={chol.rhs_size}, solution_size={chol.solution_size}"
)
print(f"single rhs rel_error={rel_x:.2e}, rel_residual={rel_res:.2e}")
print(f"matrix rhs rel_error={rel_X:.2e}, solution shape={X_chol.shape}")
print("first four x:", np.round(np.array(x_chol)[:4], 6))
metadata: backend=accelerate, method=cholesky, rhs_size=8, solution_size=8
single rhs rel_error=7.93e-08, rel_residual=1.71e-07
matrix rhs rel_error=9.09e-08, solution shape=(8, 3)
first four x: [0.392788 0.571153 0.691824 0.796143]
LU and spsolve#
method="lu" handles general nonsingular square systems when Accelerate LU
is available in the Apple SDK and runtime. spsolve(A, b) is the one-shot
convenience wrapper, it is useful when you do not need to reuse the
factorization.
lu = linalg.factorized(A_csc, method="lu")
x_lu = lu(b)
x_direct = linalg.spsolve(A_csr, b)
mx.eval(x_lu, x_direct)
rel_lu = np.linalg.norm(np.array(x_lu) - x_ref) / np.linalg.norm(x_ref)
rel_direct = np.linalg.norm(np.array(x_direct) - x_ref) / np.linalg.norm(x_ref)
print(
f"metadata: backend={lu.backend}, method={lu.method}, "
f"rhs_size={lu.rhs_size}, solution_size={lu.solution_size}"
)
print(f"lu rel_error={rel_lu:.2e}")
print(f"spsolve rel_error={rel_direct:.2e}")
print("spsolve first four:", np.round(np.array(x_direct)[:4], 6))
metadata: backend=accelerate, method=lu, rhs_size=8, solution_size=8
lu rel_error=6.48e-08
spsolve rel_error=6.48e-08
spsolve first four: [0.392788 0.571153 0.691824 0.796143]
Format coverage#
For square systems, method="auto" chooses LU. The public input may be CSR,
CSC, or COO. Internally, the Accelerate adapter validates dtype, shape, index
bounds, canonicalization, and overflow limits before constructing the Apple
CSC object.
for name, matrix in [("CSR", A_csr), ("CSC", A_csc), ("COO", A_coo)]:
solver = linalg.factorized(matrix, method="auto")
y = solver.solve(b)
mx.eval(y)
rel = np.linalg.norm(np.array(y) - x_ref) / np.linalg.norm(x_ref)
print(
f"{name}: backend={solver.backend:10s} "
f"method={solver.method:9s} rel_error={rel:.2e}"
)
CSR: backend=accelerate method=lu rel_error=6.48e-08
CSC: backend=accelerate method=lu rel_error=6.48e-08
COO: backend=accelerate method=lu rel_error=6.48e-08
Practical rule#
Use factorized when the same matrix is solved repeatedly. Use
spsolve when you need one direct solve and do not want to keep a solver
object. Use the explicit native sparse_lu or sparse_cholesky APIs only
when you need inspectable sparse factors.