Solvers#
This page summarizes the public mlx_sparse.linalg solver surface and where
each solver runs. It is meant to answer two questions quickly:
Which sparse solver should I call?
Does that path run on CPU, Metal GPU, Apple Accelerate, or a mix?
Support labels#
Label |
Meaning |
|---|---|
Full CPU + GPU |
Native CPU and Metal GPU implementations are available for the solver’s
main numerical work. Calling |
Partial |
The solver is available from CPU and GPU contexts, but one important phase still runs on CPU. Common examples are CPU sparse factorization followed by GPU triangular solves, or GPU Krylov projection followed by a small CPU dense solve/eigendecomposition. |
CPU only |
The solver runs on CPU. Apple Accelerate sparse direct solvers are in this category even when MLX’s default device is GPU. |
GPU only |
A GPU implementation exists without a CPU implementation. No public
|
All native sparse linalg entrypoints accept CSRArray, COOArray, and
CSCArray inputs unless the function documentation says otherwise. Native
solver kernels normalize sparse inputs to canonical CSR at solver entry.
Accelerate-backed direct solves normalize supported real inputs to canonical
CSC because Apple’s sparse solver API is CSC-oriented.
Diagnostics and callbacks#
cg, gmres, and minres still return (x, info) by default.
info follows the sparse linalg convention used throughout this release:
0 means converged, positive values mean the iteration budget was exhausted,
and negative values mean numerical breakdown or an invalid iterative path.
Pass return_info=True to replace the integer with a structured
SolverInfo object. It records the integer status, final true residual norm,
iteration count, convergence reason, breakdown reason when applicable, solver
name, tolerance settings, restart size for GMRES, and the preconditioner kind
when one is used. Preconditioned residual norms are reported only by native
paths that expose them, otherwise the field is None.
Python callbacks are opt-in exit callbacks for the native sparse solvers. They
are called once after the native CPU/Metal loop finishes, so the default solve
path does not synchronize with Python inside each Krylov iteration. cg and
minres callbacks receive the final solution. gmres(callback_type="x")
receives the final solution, while "pr_norm" and "legacy" receive the
final reported residual norm. These GMRES payload names intentionally mirror
SciPy’s callback types, but mlx-sparse does not call Python at every restart or
inner iteration in native paths, and "legacy" does not change maxiter
accounting. Use return_info=True for solver diagnostics that do not need a
callback.
Solver support matrix#
API |
Use case |
Native CPU/GPU coverage |
Accelerate coverage |
Notes |
|---|---|---|---|---|
|
Iterative solve for square symmetric positive-definite systems. |
Partial |
No |
The unpreconditioned, Jacobi-preconditioned, and
Chebyshev-preconditioned CG iterations run in native CPU/Metal kernels.
IC(0)-preconditioned CG is a native CPU-hosted loop because the
preconditioner apply is a pair of triangular solves. Fully matrix-free
|
|
Iterative solve for square general systems. |
Partial |
No |
Unpreconditioned, diagonal/Jacobi-preconditioned,
ILU(0)-preconditioned, and exact-factor preconditioned entries are
native. Arnoldi work can run on GPU when selected, restart bookkeeping,
true-residual checks, and the small least-squares solve run on CPU.
Custom callable preconditioners and fully matrix-free |
|
Iterative solve for square symmetric indefinite systems. |
Full CPU + GPU |
No |
Unpreconditioned and diagonal/Jacobi-preconditioned shifted MINRES use native Paige-Saunders recurrence kernels. Diagonal/Jacobi preconditioners must be SPD. |
|
One-shot direct solve for square systems. |
Partial |
CPU only, optional |
Native fallback uses CPU LU factorization and can use GPU triangular solves. Accelerate-enabled Apple builds may use opaque Accelerate LU for supported real systems. |
|
Direct solve for explicit sparse triangular systems. |
Full CPU + GPU |
No |
Dispatches to the native CSR triangular-solve primitive for rank-1 or rank-2 RHS arrays. Public triangular analysis remains deferred until repeated-apply benchmarks show a consistent advantage. |
|
Reusable direct solve object for square general systems. |
Partial |
CPU only, optional |
Native LU is available everywhere. Accelerate LU is available only when
the build and Apple runtime expose |
|
Reusable direct solve object for square SPD systems. |
Partial |
CPU only, optional |
Native explicit Cholesky is available everywhere. Accelerate Cholesky uses an opaque factorization object and does not expose sparse factors. |
|
Reusable direct solve object for square symmetric indefinite systems. |
No native path |
CPU only, optional |
Available only in Accelerate-enabled Apple builds for supported real inputs. |
|
Reusable least-squares solve object for rectangular systems. |
No native path |
CPU only, optional |
Available only in Accelerate-enabled Apple builds for supported real inputs. |
|
Reusable normal-equation solve object for rectangular systems. |
No native path |
CPU only, optional |
Available only in Accelerate-enabled Apple builds for supported real inputs. |
|
Explicit sparse Cholesky factorization for SPD systems. |
CPU only for factorization |
No |
Returns explicit mlx-sparse CSR factors. The returned
|
|
Explicit sparse LU factorization for square general systems. |
CPU only for factorization |
No |
Returns explicit mlx-sparse CSR factors and pivots. The returned
|
|
Reuse explicit native factors for one or more right-hand sides. |
Full CPU + GPU for solve phase |
No |
This row covers the solve phase only, the factors were produced by the CPU factorization APIs above. CPU matrix-RHS solves use one native triangular-solve sequence instead of a Python loop over RHS columns. |
|
A few eigenpairs of a square symmetric/Hermitian sparse matrix. |
Partial |
No |
Lanczos projection can run on GPU, the small projected eigensolve runs
on CPU. |
|
A few eigenpairs of a square general sparse matrix. |
Partial |
No |
Arnoldi projection can run on GPU, the small Hessenberg eigensolve runs
on CPU. |
|
A few singular values/vectors of a sparse matrix. |
Partial |
No |
The native normal-operator Lanczos step can run on GPU, the small
eigensolve and singular-vector assembly run on CPU. |
|
Low-level Lanczos projection helper. |
Partial |
No |
The projection can run on GPU and accepts |
Accelerate direct solves#
Accelerate support is opt-in at build time with
MLX_SPARSE_ENABLE_ACCELERATE=ON. Portable wheels that are not built with
this option report ms.capabilities.ACCELERATE == False and use the native
paths above.
Method |
Matrix requirement |
Availability |
|---|---|---|
|
Square, real, symmetric positive-definite. |
Accelerate-enabled Apple build. |
|
Square, real, symmetric indefinite. |
Accelerate-enabled Apple build. |
|
Square, real, general nonsingular. |
Accelerate-enabled Apple build with LU available in the Apple SDK and runtime. |
|
Real rectangular least-squares system. |
Accelerate-enabled Apple build. |
|
Real rectangular normal-equation solve. |
Accelerate-enabled Apple build. |
Accelerate direct solves currently operate on float32 factorization
objects. float16 and bfloat16 inputs are promoted to float32 for
the solve. Complex sparse inputs are rejected. int32 and int64 sparse
indices are accepted after validation against the matrix shape and the limits
of the Accelerate sparse API.
Choosing a solver#
Use
cgfor large SPD systems when an iterative method is appropriate.Use
gmresfor large general square systems when LU factorization is too expensive or too memory-heavy.Use
minresfor large symmetric indefinite systems.Use
spsolvefor a one-shot square direct solve.Use
factorizedwhen solving the same sparse system against multiple right-hand sides. Native explicit-factor solves accept rank-2 RHS arrays on CPU, and Accelerate-enabled builds can use opaque framework solves for supported methods.Use
sparse_choleskyorsparse_luonly when you need explicit mlx-sparse factor objects.Use
spsolve_triangularwhen you already have an explicit sparse triangular factor and want the native rank-1/rank-2 triangular-solve path.Use
eigsh,eigs, orsvdswhen you need only a few spectral values/vectors rather than a dense decomposition.
Preconditioners#
linalg.cg accepts native-backed identity, diagonal, jacobi,
ichol0, and chebyshev preconditioners from
mlx_sparse.linalg.preconditioners. identity uses the existing
unpreconditioned CG path. diagonal and jacobi dispatch to native
Jacobi-preconditioned CG on CPU or Metal depending on the selected MLX device.
chebyshev dispatches to a native polynomial-preconditioned CG path whose
preconditioner application uses only sparse matrix-vector products and vector
updates. These paths still test convergence against the true residual
||b - A @ x||. ichol0 dispatches to a native C++ IC(0)-preconditioned
CG loop, setup runs on CPU and standalone preconditioner application uses
native CSR triangular solves on CPU or Metal.
linalg.gmres accepts identity, diagonal/jacobi, ilu0,
exact-factor preconditioners, and explicit inverse-apply callables or objects.
The diagonal/Jacobi, ILU(0), and exact-factor paths build Krylov vectors for
M^{-1} A through native solver entrypoints and test convergence against the
true residual b - A @ x. ILU(0) setup runs on CPU and applies through native
CSR triangular solves on CPU or Metal. Explicit native LU/Cholesky factors apply
through native permutation/triangular-solve bindings, guarded Accelerate
factorized objects use Apple’s CPU sparse solver when that support is built in.
Custom callable/object preconditioners still use a slower host fallback because
arbitrary Python cannot be called from native solver kernels.
linalg.minres accepts identity plus finite strictly positive
diagonal/jacobi preconditioners. This is stricter than GMRES because
preconditioned MINRES requires a symmetric positive-definite preconditioner.
minres(..., shift=s) solves (A - s I) x = b.