Accelerate direct solvers: rectangular least squares#
Accelerate-enabled builds add direct sparse solvers for rectangular systems
through linalg.factorized. This notebook compares method="qr" and
method="cholesky_ata" on an overdetermined sparse least-squares problem.
Published macOS wheels include this support. If you are using an editable or source build, follow the Accelerate build instructions in the installation guide before running this notebook
import mlx.core as mx
import numpy as np
import scipy.sparse
import mlx_sparse as ms
from mlx_sparse import linalg
INSTALLATION = "https://mlx-sparse.readthedocs.io/en/latest/installation.html"
print("Accelerate status:", ms.capabilities.status("accelerate"))
print("Accelerate available:", ms.capabilities.ACCELERATE)
if not ms.capabilities.ACCELERATE:
raise RuntimeError(
"This notebook requires an Accelerate-enabled build. "
f"For editable/source installs, follow {INSTALLATION}."
)
Accelerate status: available
Accelerate available: True
Build an overdetermined sparse system#
The system has 12 equations and 5 unknowns. We generate a full-rank sparse
design matrix and add a small deterministic perturbation to b so the
least-squares residual is nonzero. That makes it clear that the solver is
minimizing ||A x - b|| rather than solving an exactly consistent system.
m, n = 12, 5
rows, cols, vals = [], [], []
for i in range(m):
j = i % n
rows.append(i)
cols.append(j)
vals.append(1.0 + 0.05 * i)
rows.append(i)
cols.append((j + 1) % n)
vals.append(0.25)
if i % 2 == 0:
rows.append(i)
cols.append((j + 2) % n)
vals.append(-0.15)
A_sp = scipy.sparse.coo_matrix(
(
np.array(vals, dtype=np.float32),
(np.array(rows, dtype=np.int32), np.array(cols, dtype=np.int32)),
),
shape=(m, n),
).tocsr()
x_true = np.array([0.5, -1.0, 1.5, 0.25, 2.0], dtype=np.float32)
noise = np.linspace(-0.02, 0.02, m, dtype=np.float32)
b_np = A_sp @ x_true + noise
A = ms.from_scipy(A_sp, format="csr")
b = mx.array(b_np)
print(f"shape={A.shape}, nnz={A.nnz}, density={A.nnz / (m * n):.3f}")
print(f"dense rank={np.linalg.matrix_rank(A_sp.toarray())}")
print("first four b entries:", np.round(b_np[:4], 6))
shape=(12, 5), nnz=30, density=0.500
dense rank=5
first four b entries: [ 0.005 -0.691364 1.399773 0.778409]
QR least squares#
method="qr" is the default Accelerate choice for rectangular systems. It
solves the least-squares problem directly and is the safer default when the
matrix may be poorly conditioned.
qr = linalg.factorized(A, method="qr")
x_qr = qr.solve(b)
mx.eval(x_qr)
x_ref, *_ = np.linalg.lstsq(A_sp.toarray(), b_np, rcond=None)
rel_error = np.linalg.norm(np.array(x_qr) - x_ref) / np.linalg.norm(x_ref)
rel_res = np.linalg.norm(A_sp @ np.array(x_qr) - b_np) / np.linalg.norm(b_np)
print(
f"metadata: backend={qr.backend}, method={qr.method}, "
f"rhs_size={qr.rhs_size}, solution_size={qr.solution_size}"
)
print(f"relative error vs numpy lstsq: {rel_error:.2e}")
print(f"relative residual ||Ax-b||/||b||: {rel_res:.2e}")
print("x_qr:", np.round(np.array(x_qr), 6))
metadata: backend=accelerate, method=qr, rhs_size=12, solution_size=5
relative error vs numpy lstsq: 9.95e-08
relative residual ||Ax-b||/||b||: 8.12e-03
x_qr: [ 0.499233 -0.99647 1.497709 0.249898 2.003961]
Normal equations via Cholesky-at-A#
method="cholesky_ata" factors A.T @ A. Its solve method expects
the normal-equation right-hand side A.T @ b with length n, not the
original least-squares right-hand side b with length m. It is often
fast, but it squares the condition number of A. Prefer QR when robustness
matters, use Cholesky-at-A when the matrix is known to be well conditioned
and the performance tradeoff is worth it.
ata = linalg.factorized(A, method="cholesky_ata")
normal_rhs_np = A_sp.T @ b_np
normal_rhs = mx.array(np.asarray(normal_rhs_np, dtype=np.float32))
x_ata = ata.solve(normal_rhs)
mx.eval(x_ata)
rel_diff = np.linalg.norm(np.array(x_ata) - np.array(x_qr)) / np.linalg.norm(np.array(x_qr))
rel_res_ata = np.linalg.norm(A_sp @ np.array(x_ata) - b_np) / np.linalg.norm(b_np)
print(
f"metadata: backend={ata.backend}, method={ata.method}, "
f"rhs_size={ata.rhs_size}, solution_size={ata.solution_size}"
)
print(f"normal RHS shape={normal_rhs.shape}")
print(f"relative difference vs QR: {rel_diff:.2e}")
print(f"relative residual ||Ax-b||/||b||: {rel_res_ata:.2e}")
metadata: backend=accelerate, method=cholesky_ata, rhs_size=5, solution_size=5
normal RHS shape=(5,)
relative difference vs QR: 1.08e-07
relative residual ||Ax-b||/||b||: 8.12e-03
Multiple right-hand sides#
The same factorized rectangular solver can be reused for a matrix of right-hand
sides. The input has shape (m, nrhs) and the solution has shape
(n, nrhs).
x_other = np.array([1.0, 0.0, -0.5, 1.5, 0.25], dtype=np.float32)
B_np = np.stack([b_np, A_sp @ x_other], axis=1)
B = mx.array(B_np)
X = qr.solve(B)
mx.eval(X)
X_ref = np.linalg.lstsq(A_sp.toarray(), B_np, rcond=None)[0]
rel_X = np.linalg.norm(np.array(X) - X_ref) / np.linalg.norm(X_ref)
col_residuals = [
np.linalg.norm(A_sp @ np.array(X)[:, k] - B_np[:, k]) / np.linalg.norm(B_np[:, k])
for k in range(B_np.shape[1])
]
print(f"input RHS shape={B.shape}, solution shape={X.shape}")
print(f"matrix RHS rel_error={rel_X:.2e}")
print("column residuals:", [f"{r:.2e}" for r in col_residuals])
input RHS shape=(12, 2), solution shape=(5, 2)
matrix RHS rel_error=1.18e-07
column residuals: ['8.12e-03', '1.80e-07']
Auto method#
method="auto" chooses LU for square matrices and QR for rectangular
matrices. For least-squares workloads this gives a conservative default while
still allowing explicit cholesky_ata when you want that tradeoff.
auto = linalg.factorized(A, method="auto")
x_auto = auto.solve(b)
mx.eval(x_auto)
rel_auto = np.linalg.norm(np.array(x_auto) - np.array(x_qr)) / np.linalg.norm(np.array(x_qr))
print(f"auto backend={auto.backend}, method={auto.method}")
print(f"auto vs explicit QR rel_diff={rel_auto:.2e}")
auto backend=accelerate, method=qr
auto vs explicit QR rel_diff=0.00e+00
For large rectangular systems, compare QR and Cholesky-at-A on your matrix family. QR is usually the safer numerical default, Cholesky-at-A can be useful when the system is well conditioned and repeated solves dominate the workload.