Compressed Sparse formats CSR and CSC in Python

2 min readJun 17, 2019

Compressed sparse row (CSR) and compressed sparse column (CSC) are widely known and most used formats of sparse data structures. Mainly, they are used for write-once-read-many tasks.

Compressed Sparse Column (CSC) format is almost identical, except that values are indexed first by column with a column-major order. Usually, the CSC is used when there are more rows than columns. On the contrary, the CSR works better for a ‘wide’ format. So, her is taking CSR as an example here.

Internally, CSR is based on three NumPy arrays:

data is an array of corresponding nonzero values
indices is array of column indices
indptr points to row starts in data and indices

~ length of indptr is number of rows + 1, last item in indptr = number of nonvalues = length of both data and indices

~ nonzero items of the i-th row are located data[indptr[i]:indptr[i+1]] with column indices indices[indptr[i]:indptr[i+1]]

If you’re new to the SciPy sparse matrix game, you might find yourself stymied by the ‘indptr’ array, which can be used to instantiate a csc_matrix or a csr_matrix object. Here I give an example to explain how to compute matrix through NumPy arrays.

Example: create using (data, indices, indptr) tuple as below:

>>> data = np.array([1,2,3,4,5,6])

>>> indices = np.array([0,2,2,0,1,2])

>>> indptr = np.array([0,2,3,6])

>>> mtx = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))

>>> mtx.todense()

matrix([1,0,2], [0,0,3],[4,5,6])

How to get result matrix through the given (data, indices, indptr) tuple? Here is what it is going:

i = 0 -> indptr[0] = 0, indptr[0 + 1] = 2 -> data[0:2] = 1,2 and indices[0:2] = 0,2 -> first row of matrix is [1,0,2]

i = 1 -> indptr[1] = 2, indptr[1 + 1] = 3 -> data[2:3] = 3 and indices[2:3] = 2 -> second row of matrix is [0,0,3]

i = 2 -> indptr[2] = 3, indptr[2 + 1] = 6 -> data[3:6] = 4,5,6 and indices[3:6] = 4,5,6 -> third row of matrix is [4,5,6]

Thanks for reading. Welcome to comment if you have doubts.

Compressed Sparse formats CSR and CSC in Python

Written by R. Jin