Accelerate direct solvers: rectangular least squares#

Accelerate-enabled builds add direct sparse solvers for rectangular systems through linalg.factorized. This notebook compares method="qr" and method="cholesky_ata" on an overdetermined sparse least-squares problem.

Published macOS wheels include this support. If you are using an editable or source build, follow the Accelerate build instructions in the installation guide before running this notebook

import mlx.core as mx
import numpy as np
import scipy.sparse

import mlx_sparse as ms
from mlx_sparse import linalg

INSTALLATION = "https://mlx-sparse.readthedocs.io/en/latest/installation.html"

print("Accelerate status:", ms.capabilities.status("accelerate"))
print("Accelerate available:", ms.capabilities.ACCELERATE)

if not ms.capabilities.ACCELERATE:
    raise RuntimeError(
        "This notebook requires an Accelerate-enabled build. "
        f"For editable/source installs, follow {INSTALLATION}."
    )

Accelerate status: available
Accelerate available: True

Build an overdetermined sparse system#

The system has 12 equations and 5 unknowns. We generate a full-rank sparse design matrix and add a small deterministic perturbation to b so the least-squares residual is nonzero. That makes it clear that the solver is minimizing ||A x - b|| rather than solving an exactly consistent system.

m, n = 12, 5
rows, cols, vals = [], [], []
for i in range(m):
    j = i % n
    rows.append(i)
    cols.append(j)
    vals.append(1.0 + 0.05 * i)
    rows.append(i)
    cols.append((j + 1) % n)
    vals.append(0.25)
    if i % 2 == 0:
        rows.append(i)
        cols.append((j + 2) % n)
        vals.append(-0.15)

A_sp = scipy.sparse.coo_matrix(
    (
        np.array(vals, dtype=np.float32),
        (np.array(rows, dtype=np.int32), np.array(cols, dtype=np.int32)),
    ),
    shape=(m, n),
).tocsr()

x_true = np.array([0.5, -1.0, 1.5, 0.25, 2.0], dtype=np.float32)
noise = np.linspace(-0.02, 0.02, m, dtype=np.float32)
b_np = A_sp @ x_true + noise

A = ms.from_scipy(A_sp, format="csr")
b = mx.array(b_np)

print(f"shape={A.shape}, nnz={A.nnz}, density={A.nnz / (m * n):.3f}")
print(f"dense rank={np.linalg.matrix_rank(A_sp.toarray())}")
print("first four b entries:", np.round(b_np[:4], 6))

shape=(12, 5), nnz=30, density=0.500
dense rank=5
first four b entries: [ 0.005    -0.691364  1.399773  0.778409]

QR least squares#

method="qr" is the default Accelerate choice for rectangular systems. It solves the least-squares problem directly and is the safer default when the matrix may be poorly conditioned.

qr = linalg.factorized(A, method="qr")
x_qr = qr.solve(b)
mx.eval(x_qr)

x_ref, *_ = np.linalg.lstsq(A_sp.toarray(), b_np, rcond=None)
rel_error = np.linalg.norm(np.array(x_qr) - x_ref) / np.linalg.norm(x_ref)
rel_res = np.linalg.norm(A_sp @ np.array(x_qr) - b_np) / np.linalg.norm(b_np)

print(
    f"metadata: backend={qr.backend}, method={qr.method}, "
    f"rhs_size={qr.rhs_size}, solution_size={qr.solution_size}"
)
print(f"relative error vs numpy lstsq: {rel_error:.2e}")
print(f"relative residual ||Ax-b||/||b||: {rel_res:.2e}")
print("x_qr:", np.round(np.array(x_qr), 6))

metadata: backend=accelerate, method=qr, rhs_size=12, solution_size=5
relative error vs numpy lstsq: 9.95e-08
relative residual ||Ax-b||/||b||: 8.12e-03
x_qr: [ 0.499233 -0.99647   1.497709  0.249898  2.003961]

Normal equations via Cholesky-at-A#

method="cholesky_ata" factors A.T @ A. Its solve method expects the normal-equation right-hand side A.T @ b with length n, not the original least-squares right-hand side b with length m. It is often fast, but it squares the condition number of A. Prefer QR when robustness matters, use Cholesky-at-A when the matrix is known to be well conditioned and the performance tradeoff is worth it.

ata = linalg.factorized(A, method="cholesky_ata")
normal_rhs_np = A_sp.T @ b_np
normal_rhs = mx.array(np.asarray(normal_rhs_np, dtype=np.float32))
x_ata = ata.solve(normal_rhs)
mx.eval(x_ata)

rel_diff = np.linalg.norm(np.array(x_ata) - np.array(x_qr)) / np.linalg.norm(np.array(x_qr))
rel_res_ata = np.linalg.norm(A_sp @ np.array(x_ata) - b_np) / np.linalg.norm(b_np)

print(
    f"metadata: backend={ata.backend}, method={ata.method}, "
    f"rhs_size={ata.rhs_size}, solution_size={ata.solution_size}"
)
print(f"normal RHS shape={normal_rhs.shape}")
print(f"relative difference vs QR: {rel_diff:.2e}")
print(f"relative residual ||Ax-b||/||b||: {rel_res_ata:.2e}")

metadata: backend=accelerate, method=cholesky_ata, rhs_size=5, solution_size=5
normal RHS shape=(5,)
relative difference vs QR: 1.08e-07
relative residual ||Ax-b||/||b||: 8.12e-03

Multiple right-hand sides#

The same factorized rectangular solver can be reused for a matrix of right-hand sides. The input has shape (m, nrhs) and the solution has shape (n, nrhs).

x_other = np.array([1.0, 0.0, -0.5, 1.5, 0.25], dtype=np.float32)
B_np = np.stack([b_np, A_sp @ x_other], axis=1)
B = mx.array(B_np)

X = qr.solve(B)
mx.eval(X)

X_ref = np.linalg.lstsq(A_sp.toarray(), B_np, rcond=None)[0]
rel_X = np.linalg.norm(np.array(X) - X_ref) / np.linalg.norm(X_ref)
col_residuals = [
    np.linalg.norm(A_sp @ np.array(X)[:, k] - B_np[:, k]) / np.linalg.norm(B_np[:, k])
    for k in range(B_np.shape[1])
]

print(f"input RHS shape={B.shape}, solution shape={X.shape}")
print(f"matrix RHS rel_error={rel_X:.2e}")
print("column residuals:", [f"{r:.2e}" for r in col_residuals])

input RHS shape=(12, 2), solution shape=(5, 2)
matrix RHS rel_error=1.18e-07
column residuals: ['8.12e-03', '1.80e-07']

Auto method#

method="auto" chooses LU for square matrices and QR for rectangular matrices. For least-squares workloads this gives a conservative default while still allowing explicit cholesky_ata when you want that tradeoff.

auto = linalg.factorized(A, method="auto")
x_auto = auto.solve(b)
mx.eval(x_auto)

rel_auto = np.linalg.norm(np.array(x_auto) - np.array(x_qr)) / np.linalg.norm(np.array(x_qr))
print(f"auto backend={auto.backend}, method={auto.method}")
print(f"auto vs explicit QR rel_diff={rel_auto:.2e}")

auto backend=accelerate, method=qr
auto vs explicit QR rel_diff=0.00e+00

For large rectangular systems, compare QR and Cholesky-at-A on your matrix family. QR is usually the safer numerical default, Cholesky-at-A can be useful when the system is well conditioned and repeated solves dominate the workload.