Getting started: end-to-end sparse workflow#
This tutorial walks through a complete sparse matrix workflow from scratch:
construction, conversion, products, inspection, and differentiation. It
assumes you have completed the installation step and have mlx.core,
numpy, and mlx_sparse available.
Setup#
import mlx.core as mx
import numpy as np
import mlx_sparse as ms
# Confirm the native extension compiled successfully.
print("native extension:", ms.is_available())
# Choose a device. Fixed-shape sparse primitives run on CPU or GPU.
ms.use_gpu()
Step 1: build a small matrix in COO format#
We will construct the 4x5 matrix:
[[ 3, 0, -1, 0, 0],
[ 0, 2, 0, 0, 0],
[ 0, 0, 0, 4, -2],
[ 0, 0, 0, 0, 0]]
COO is the right format for construction because it accepts entries in any order and allows duplicates.
rows_np = np.array([0, 0, 1, 2, 2], dtype=np.int32)
cols_np = np.array([0, 2, 1, 3, 4], dtype=np.int32)
vals_np = np.array([3.0, -1.0, 2.0, 4.0, -2.0], dtype=np.float32)
data = mx.array(vals_np)
row = mx.array(rows_np)
col = mx.array(cols_np)
coo = ms.coo_array((data, (row, col)), shape=(4, 5))
print(coo)
# COOArray(shape=(4, 5), nnz=5, dtype=float32, index_dtype=int32,
# has_canonical_format=False)
The nnz count reports five stored entries. Row 3 is empty. It has no
entries and contributes nothing to nnz.
Step 2: convert to CSR and canonicalize#
COO can run native sparse-dense products directly. This tutorial converts to
CSR here to show the compressed row layout and canonicalization path used by
row-oriented products and solvers. Pass canonical=True to sort column
indices and sum duplicates in one call. For this matrix there are no
duplicates, so the nnz stays at 5.
csr = coo.tocsr(canonical=True)
print(csr)
# CSRArray(shape=(4, 5), nnz=5, dtype=float32, index_dtype=int32,
# sorted_indices=True, has_canonical_format=True)
Inspect the CSR buffers:
mx.eval(csr.data, csr.indices, csr.indptr)
print("data: ", np.array(csr.data)) # [ 3. -1. 2. 4. -2.]
print("indices:", np.array(csr.indices)) # [0 2 1 3 4]
print("indptr: ", np.array(csr.indptr)) # [0 2 3 5 5]
The indptr encodes:
Row 0: entries at positions 0–2 -> columns 0 and 2 (values 3 and -1).
Row 1: entries at positions 2–3 -> column 1 (value 2).
Row 2: entries at positions 3–5 -> columns 3 and 4 (values 4 and -2).
Row 3: entries at positions 5–5 -> empty.
Step 3: sparse-dense matrix-vector product#
x = mx.array(np.array([1.0, 1.0, 1.0, 1.0, 1.0], dtype=np.float32))
y = csr @ x
mx.eval(y)
print(np.array(y)) # [ 2. 2. 2. 0.]
Row 0: 3x1 + (−1)x1 = 2. Row 1: 2x1 = 2. Row 2: 4x1 + (−2)x1 = 2. Row 3: empty -> 0.
Step 4: sparse-dense matrix-matrix product#
The @ operator dispatches to csr_matmul() when the
right-hand side is a 2D array.
B = mx.array(np.eye(5, dtype=np.float32)) # 5x5 identity
Y = csr @ B
mx.eval(Y)
print(Y.shape) # (4, 5)
# Y should equal the dense matrix itself (A @ I = A).
Step 5: materialise as a dense array#
dense = csr.todense()
mx.eval(dense)
print(np.array(dense))
# [[ 3. 0. -1. 0. 0.]
# [ 0. 2. 0. 0. 0.]
# [ 0. 0. 0. 4. -2.]
# [ 0. 0. 0. 0. 0.]]
Step 6: transpose and Hermitian transpose#
At = csr.T
print(At)
# CSRArray(shape=(5, 4), ...)
print(np.array(At.todense()))
# [[ 3. 0. 0. 0.]
# [ 0. 2. 0. 0.]
# [-1. 0. 0. 0.]
# [ 0. 0. 4. 0.]
# [ 0. 0. -2. 0.]]
For real-valued matrices, csr.H is the same as csr.T.
Step 7: building a matrix with duplicate entries#
COO construction allows duplicate coordinates. Duplicates are summed when you
call tocsr(canonical=True) or canonicalize().
# Two contributions to (0, 0): 1.5 and 2.5 -> should sum to 4.0.
data_dup = mx.array([1.5, 2.5, 3.0], dtype=mx.float32)
row_dup = mx.array([0, 0, 1], dtype=mx.int32)
col_dup = mx.array([0, 0, 1], dtype=mx.int32)
coo_dup = ms.coo_array((data_dup, (row_dup, col_dup)), shape=(2, 2))
csr_dup = coo_dup.tocsr(canonical=True)
mx.eval(csr_dup.data)
print(np.array(csr_dup.data)) # [4.0, 3.0]
print(np.array(csr_dup.todense()))
# [[4. 0.]
# [0. 3.]]
Step 8: autodiff through sparse values and the dense operand#
x_grad = mx.array(np.ones(5, dtype=np.float32))
def loss(values, x):
A = ms.csr_array((values, csr.indices, csr.indptr), shape=csr.shape)
y = A @ x
return mx.sum(y ** 2) # scalar output
grad_values, grad_x = mx.grad(loss, argnums=(0, 1))(csr.data, x_grad)
mx.eval(grad_values, grad_x)
print(np.array(grad_values))
print(np.array(grad_x))
The dense-input gradient is 2 * A.T @ (A @ x). Verify against dense:
dense_ref = csr.todense()
def dense_loss(x):
y = dense_ref @ x
return mx.sum(y ** 2)
grad_dense = mx.grad(dense_loss)(x_grad)
mx.eval(grad_dense)
np.testing.assert_allclose(
np.array(grad_x), np.array(grad_dense), rtol=1e-5, atol=1e-5
)
print("gradients match!")
What to explore next#
Finite difference Laplacian - a realistic 2D PDE stencil assembled as a sparse Laplacian.
SciPy interoperability - importing SciPy sparse matrices and comparing results.
Graph Laplacians - building graph Laplacians and applying them on GPU.
Physics workloads and sparse neural layers - quantum Hamiltonians and sparse linear layers.
Autodiff - full autodiff semantics and known limits.
Performance guide - when sparse is faster than dense, and how to measure it.