Solvers#

This page summarizes the public mlx_sparse.linalg solver surface and where each solver runs. It is meant to answer two questions quickly:

Which sparse solver should I call?
Does that path run on CPU, Metal GPU, Apple Accelerate, or a mix?

Support labels#

Label	Meaning
Full CPU + GPU	Native CPU and Metal GPU implementations are available for the solver’s main numerical work. Calling `ms.use_gpu()` or setting MLX’s default device to GPU keeps the solve on the GPU path.
Partial	The solver is available from CPU and GPU contexts, but one important phase still runs on CPU. Common examples are CPU sparse factorization followed by GPU triangular solves, or GPU Krylov projection followed by a small CPU dense solve/eigendecomposition.
CPU only	The solver runs on CPU. Apple Accelerate sparse direct solvers are in this category even when MLX’s default device is GPU.
GPU only	A GPU implementation exists without a CPU implementation. No public `mlx_sparse.linalg` solver currently has this label.

All native sparse linalg entrypoints accept CSRArray, COOArray, and CSCArray inputs unless the function documentation says otherwise. Native solver kernels normalize sparse inputs to canonical CSR at solver entry. Accelerate-backed direct solves normalize supported real inputs to canonical CSC because Apple’s sparse solver API is CSC-oriented.

Diagnostics and callbacks#

cg, bicgstab, gmres, and minres still return (x, info) by default. info follows the sparse linalg convention used throughout this release: 0 means converged, positive values mean the iteration budget was exhausted, and negative values mean numerical breakdown or an invalid iterative path.

Pass return_info=True to replace the integer with a structured SolverInfo object. It records the integer status, final true residual norm, iteration count, convergence reason, breakdown reason when applicable, solver name, tolerance settings, restart size for GMRES, and the preconditioner kind when one is used. Preconditioned residual norms are reported only by native paths that expose them, otherwise the field is None. bicgstab accepts tol as a SciPy-compatible alias for rtol, if both are provided they must agree.

Python callbacks are opt-in exit callbacks for the native sparse solvers. They are called once after the native CPU/Metal loop finishes, so the default solve path does not synchronize with Python inside each Krylov iteration. cg, bicgstab, and minres callbacks receive the final solution. gmres(callback_type="x") receives the final solution, while "pr_norm" and "legacy" receive the final reported residual norm. These GMRES payload names intentionally mirror SciPy’s callback types, but mlx-sparse does not call Python at every restart or inner iteration in native paths, and "legacy" does not change maxiter accounting. Use return_info=True for solver diagnostics that do not need a callback.

Solver support matrix#

API	Use case	Native CPU/GPU coverage	Accelerate coverage	Notes
`linalg.cg`	Iterative solve for square symmetric positive-definite systems.	Partial	No	The unpreconditioned, Jacobi-preconditioned, and Chebyshev-preconditioned CG iterations run in native CPU/Metal kernels. IC(0)-preconditioned CG is a native CPU-hosted loop because the preconditioner apply is a pair of triangular solves. Fully matrix-free `LinearOperator` inputs use a slower host fallback.
`linalg.bicgstab`	Iterative solve for square general non-symmetric systems.	Partial	No	Unpreconditioned and diagonal/Jacobi-preconditioned BiCGSTAB run the full recurrence in native CPU/Metal kernels and check the true residual before reporting success. ILU(0), exact LU, exact Cholesky, and guarded Accelerate exact-factor preconditioners use a native C++ driver with native factor inverse-apply routines. Custom callables and fully matrix-free `LinearOperator` inputs use a documented slower host fallback.
`linalg.gmres`	Iterative solve for square general systems.	Partial	No	Unpreconditioned, diagonal/Jacobi-preconditioned, ILU(0)-preconditioned, and exact-factor preconditioned entries are native. Arnoldi work can run on GPU when selected, restart bookkeeping, true-residual checks, and the small least-squares solve run on CPU. Custom callable preconditioners and fully matrix-free `LinearOperator` inputs use a documented host fallback.
`linalg.minres`	Iterative solve for square symmetric indefinite systems.	Full CPU + GPU	No	Unpreconditioned and diagonal/Jacobi-preconditioned shifted MINRES use native Paige-Saunders recurrence kernels. Diagonal/Jacobi preconditioners must be SPD.
`linalg.spsolve`	One-shot direct solve for square systems.	Partial	CPU only, optional	Native fallback uses CPU LU factorization and can use GPU triangular solves. Accelerate-enabled Apple builds may use opaque Accelerate LU for supported real systems.
`linalg.spsolve_triangular`	Direct solve for explicit sparse triangular systems.	Full CPU + GPU	No	Dispatches to the native CSR triangular-solve primitive for rank-1 or rank-2 RHS arrays. Public triangular analysis remains deferred until repeated-apply benchmarks show a consistent advantage.
`linalg.factorized(..., method="auto" \| "lu")`	Reusable direct solve object for square general systems.	Partial	CPU only, optional	Native LU is available everywhere. Accelerate LU is available only when the build and Apple runtime expose `SparseFactorizationLU`.
`linalg.factorized(..., method="cholesky")`	Reusable direct solve object for square SPD systems.	Partial	CPU only, optional	Native explicit Cholesky is available everywhere. Accelerate Cholesky uses an opaque factorization object and does not expose sparse factors.
`linalg.factorized(..., method="ldlt")`	Reusable direct solve object for square symmetric indefinite systems.	No native path	CPU only, optional	Available only in Accelerate-enabled Apple builds for supported real inputs.
`linalg.factorized(..., method="qr")`	Reusable least-squares solve object for rectangular systems.	No native path	CPU only, optional	Available only in Accelerate-enabled Apple builds for supported real inputs.
`linalg.factorized(..., method="cholesky_ata")`	Reusable normal-equation solve object for rectangular systems.	No native path	CPU only, optional	Available only in Accelerate-enabled Apple builds for supported real inputs.
`linalg.sparse_cholesky` / `linalg.cholesky`	Explicit sparse Cholesky factorization for SPD systems.	CPU only for factorization	No	Returns explicit mlx-sparse CSR factors. The returned `SparseCholesky.solve` can use native GPU triangular solves. The native CPU factorization keeps natural-order semantics and uses an allocation-light sparse accumulator implementation.
`linalg.sparse_lu` / `linalg.splu`	Explicit sparse LU factorization for square general systems.	CPU only for factorization	No	Returns explicit mlx-sparse CSR factors and pivots. The returned `SparseLU.solve` can use native GPU permutation and triangular solves. The native CPU factorization preserves the existing partial pivoting semantics and does not apply fill-reducing ordering.
`SparseCholesky.solve` / `SparseLU.solve`	Reuse explicit native factors for one or more right-hand sides.	Full CPU + GPU for solve phase	No	This row covers the solve phase only, the factors were produced by the CPU factorization APIs above. CPU matrix-RHS solves use one native triangular-solve sequence instead of a Python loop over RHS columns.
`linalg.eigsh`	A few eigenpairs of a square symmetric/Hermitian sparse matrix.	Partial	No	Lanczos projection can run on GPU, the small projected eigensolve runs on CPU. `v0` is supported, non-default `tol` and `maxiter` require an implicitly restarted loop and are rejected for now.
`linalg.eigs`	A few eigenpairs of a square general sparse matrix.	Partial	No	Arnoldi projection can run on GPU, the small Hessenberg eigensolve runs on CPU. `v0` is supported, non-default `tol` and `maxiter` require an implicitly restarted loop and are rejected for now.
`linalg.svds`	A few singular values/vectors of a sparse matrix.	Partial	No	The native normal-operator Lanczos step can run on GPU, the small eigensolve and singular-vector assembly run on CPU. `v0` is supported for the right-vector Krylov basis, non-default `tol` and `maxiter` require an implicitly restarted loop and are rejected for now.
`linalg.lanczos`	Low-level Lanczos projection helper.	Partial	No	The projection can run on GPU and accepts `v0`. Small projected post-processing is CPU work in the higher-level solvers that consume it.

Accelerate direct solves#

Accelerate support is opt-in at build time with MLX_SPARSE_ENABLE_ACCELERATE=ON. Portable wheels that are not built with this option report ms.capabilities.ACCELERATE == False and use the native paths above.

Method	Matrix requirement	Availability
`"cholesky"`	Square, real, symmetric positive-definite.	Accelerate-enabled Apple build.
`"ldlt"`	Square, real, symmetric indefinite.	Accelerate-enabled Apple build.
`"lu"`	Square, real, general nonsingular.	Accelerate-enabled Apple build with LU available in the Apple SDK and runtime.
`"qr"`	Real rectangular least-squares system.	Accelerate-enabled Apple build.
`"cholesky_ata"`	Real rectangular normal-equation solve.	Accelerate-enabled Apple build.

Accelerate direct solves currently operate on float32 factorization objects. float16 and bfloat16 inputs are promoted to float32 for the solve. Complex sparse inputs are rejected. int32 and int64 sparse indices are accepted after validation against the matrix shape and the limits of the Accelerate sparse API.

Choosing a solver#

Use cg for large SPD systems when an iterative method is appropriate.
Use bicgstab for large non-symmetric square systems when a short-recurrence method is preferred over restarted GMRES.
Use gmres for large general square systems when LU factorization is too expensive or too memory-heavy.
Use minres for large symmetric indefinite systems.
Use spsolve for a one-shot square direct solve.
Use factorized when solving the same sparse system against multiple right-hand sides. Native explicit-factor solves accept rank-2 RHS arrays on CPU, and Accelerate-enabled builds can use opaque framework solves for supported methods.
Use sparse_cholesky or sparse_lu only when you need explicit mlx-sparse factor objects.
Use spsolve_triangular when you already have an explicit sparse triangular factor and want the native rank-1/rank-2 triangular-solve path.
Use eigsh, eigs, or svds when you need only a few spectral values/vectors rather than a dense decomposition.

Preconditioners#

linalg.cg accepts native-backed identity, diagonal, jacobi, ichol0, and chebyshev preconditioners from mlx_sparse.linalg.preconditioners. identity uses the existing unpreconditioned CG path. diagonal and jacobi dispatch to native Jacobi-preconditioned CG on CPU or Metal depending on the selected MLX device. chebyshev dispatches to a native polynomial-preconditioned CG path whose preconditioner application uses only sparse matrix-vector products and vector updates. These paths still test convergence against the true residual ||b - A @ x||. ichol0 dispatches to a native C++ IC(0)-preconditioned CG loop, setup runs on CPU and standalone preconditioner application uses native CSR triangular solves on CPU or Metal.

linalg.gmres accepts identity, diagonal/jacobi, ilu0, exact-factor preconditioners, and explicit inverse-apply callables or objects. The diagonal/Jacobi, ILU(0), and exact-factor paths build Krylov vectors for M^{-1} A through native solver entrypoints and test convergence against the true residual b - A @ x. ILU(0) setup runs on CPU and applies through native CSR triangular solves on CPU or Metal. Explicit native LU/Cholesky factors apply through native permutation/triangular-solve bindings, guarded Accelerate factorized objects use Apple’s CPU sparse solver when that support is built in. Custom callable/object preconditioners still use a slower host fallback because arbitrary Python cannot be called from native solver kernels.

linalg.bicgstab accepts identity, diagonal/jacobi, ilu0, exact-factor preconditioners, and explicit inverse-apply callables or objects. identity uses the unpreconditioned native path. Diagonal/Jacobi dispatches to native CPU/Metal kernels. ILU(0), exact LU, exact Cholesky, and guarded Accelerate exact-factor paths are driven from native C++ and use native factor applications. Callable/object preconditioners and fully matrix-free operators use the slower host fallback because arbitrary Python cannot execute inside native kernels.

linalg.minres accepts identity plus finite strictly positive diagonal/jacobi preconditioners. This is stricter than GMRES because preconditioned MINRES requires a symmetric positive-definite preconditioner. minres(..., shift=s) solves (A - s I) x = b.