SciPy interoperability#

SciPy’s scipy.sparse module is the most common source of sparse matrices in Python scientific computing. This tutorial shows how to transfer SciPy sparse matrices to mlx-sparse, run products on Apple Silicon, and compare results.

SciPy is an optional dependency of mlx-sparse. It is listed under the dev and bench extras. Any code that imports it will need a guard:

import pytest
scipy_sparse = pytest.importorskip("scipy.sparse")  # in tests
# or:
try:
    import scipy.sparse
except ImportError:
    scipy_sparse = None

Converting a SciPy CSR matrix#

SciPy CSR arrays expose .data, .indices, and .indptr as NumPy arrays. Wrap them in mx.array and pass to csr_array().

import mlx.core as mx
import numpy as np
import scipy.sparse
import mlx_sparse as ms

# Generate a random 512x512 sparse matrix with 1% density.
rng = np.random.default_rng(42)
sp = scipy.sparse.random(512, 512, density=0.01, format="csr",
                          dtype=np.float32, random_state=rng)

# Convert indices to int32 (SciPy defaults to int32 on 64-bit platforms,
# but check explicitly to be safe).
csr = ms.csr_array(
    (mx.array(sp.data.astype(np.float32)),
     mx.array(sp.indices.astype(np.int32)),
     mx.array(sp.indptr.astype(np.int32))),
    shape=sp.shape,
    sorted_indices=True,   # SciPy CSR has sorted indices by default
    canonical=True,        # SciPy CSR is canonical by default
    validate="metadata",   # skip full validation. scipy is correct by construction
)

print(csr)
# CSRArray(shape=(512, 512), nnz=..., dtype=float32, index_dtype=int32,
#          sorted_indices=True, has_canonical_format=True)

Verifying products against SciPy#

Once you have a CSRArray, run a product and compare to SciPy’s reference result.

ms.use_cpu()   # run on CPU for fair comparison

x_np = rng.standard_normal(512).astype(np.float32)
x    = mx.array(x_np)

# mlx-sparse result
y_ms = csr @ x
mx.eval(y_ms)
y_ms_np = np.array(y_ms)

# SciPy reference
y_sp = sp @ x_np

np.testing.assert_allclose(y_ms_np, y_sp, rtol=1e-5, atol=1e-5)
print("results match SciPy!")

Converting a SciPy COO matrix#

SciPy COO arrays expose .data, .row, and .col. Because SciPy COO may contain duplicate entries, converting via coo_array() preserves them. Call tocsr(canonical=True) to sum duplicates.

sp_coo = scipy.sparse.random(128, 256, density=0.05, format="coo",
                              dtype=np.float32, random_state=rng)

coo = ms.coo_array(
    (mx.array(sp_coo.data.astype(np.float32)),
     (mx.array(sp_coo.row.astype(np.int32)),
      mx.array(sp_coo.col.astype(np.int32)))),
    shape=sp_coo.shape,
)
csr = coo.tocsr(canonical=True)

Converting back to SciPy#

To go in the other direction (for example, to pass an mlx-sparse result back to SciPy), evaluate the MLX arrays on host and construct a SciPy object:

mx.eval(csr.data, csr.indices, csr.indptr)

sp_from_ms = scipy.sparse.csr_array(
    (np.array(csr.data),
     np.array(csr.indices),
     np.array(csr.indptr)),
    shape=csr.shape,
)

Running on GPU and comparing to SciPy CPU#

This pattern is common for benchmarks: generate data with SciPy (CPU), run on both, compare accuracy.

import time

ms.use_gpu()

csr_gpu = ms.csr_array(
    (mx.array(sp.data.astype(np.float32)),
     mx.array(sp.indices.astype(np.int32)),
     mx.array(sp.indptr.astype(np.int32))),
    shape=sp.shape,
    sorted_indices=True,
    canonical=True,
    validate=False,
)
x = mx.array(x_np)

# Warmup
for _ in range(5):
    mx.eval(csr_gpu @ x)

# Timed run
t0 = time.perf_counter()
for _ in range(100):
    mx.eval(csr_gpu @ x)
gpu_ms = 1000 * (time.perf_counter() - t0) / 100

# SciPy CPU reference timing
t0 = time.perf_counter()
for _ in range(100):
    _ = sp @ x_np
scipy_ms = 1000 * (time.perf_counter() - t0) / 100

print(f"mlx-sparse GPU: {gpu_ms:.3f} ms")
print(f"SciPy CPU:      {scipy_ms:.3f} ms")

The GPU advantage over SciPy CPU depends heavily on matrix density and row length uniformity. See Performance guide for a detailed discussion.

Using SciPy in tests#

The test suite uses SciPy as a correctness reference. Tests guard against SciPy being absent:

import pytest

def test_csr_matvec_matches_scipy(mx, scipy_sparse):
    sp = scipy_sparse.random(128, 256, density=0.02, format="csr",
                              dtype=np.float32, random_state=0)
    x_np = np.random.randn(256).astype(np.float32)

    csr = ms.csr_array(
        (mx.array(sp.data),
         mx.array(sp.indices, dtype=mx.int32),
         mx.array(sp.indptr,  dtype=mx.int32)),
        shape=sp.shape,
    )
    x = mx.array(x_np)

    y = csr @ x
    mx.eval(y)
    np.testing.assert_allclose(np.array(y), sp @ x_np, rtol=1e-5, atol=1e-5)

The scipy_sparse fixture in tests/conftest.py calls pytest.importorskip("scipy.sparse"), so the test is automatically skipped if SciPy is not installed.