Compressed sparse row (CSR) and compressed sparse column (CSC) are widely known and most used formats of sparse data structures. Mainly, they are used for write-once-read-many tasks.
Compressed Sparse Column (CSC) format is almost identical, except that values are indexed first by column with a column-major order. Usually, the CSC is used when there are more rows than columns. On the contrary, the CSR works better for a ‘wide’ format. So, her is taking CSR as an example here.
Internally, CSR is based on three NumPy arrays:
data
is an array of corresponding nonzero valuesindices
is array of column indicesindptr
points to row starts indata
andindices
~ length of indptr
is number of rows + 1, last item in indptr
= number of nonvalues = length of both data
and indices
~ nonzero items of the i-th row are located data
[indptr
[i]:indptr
[i+1]] with column indices indices
[indptr
[i]:indptr
[i+1]]
If you’re new to the SciPy sparse matrix game, you might find yourself stymied by the ‘indptr’ array, which can be used to instantiate a csc_matrix or a csr_matrix object. Here I give an example to explain how to compute matrix through NumPy arrays.
Example: create using (data, indices, indptr) tuple as below:
>>> data = np.array([1,2,3,4,5,6])
>>> indices = np.array([0,2,2,0,1,2])
>>> indptr = np.array([0,2,3,6])
>>> mtx = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
>>> mtx.todense()
matrix([1,0,2], [0,0,3],[4,5,6])
How to get result matrix through the given (data, indices, indptr) tuple? Here is what it is going:
i = 0 -> indptr[0] = 0, indptr[0 + 1] = 2 -> data[0:2] = 1,2 and indices[0:2] = 0,2 -> first row of matrix is [1,0,2]
i = 1 -> indptr[1] = 2, indptr[1 + 1] = 3 -> data[2:3] = 3 and indices[2:3] = 2 -> second row of matrix is [0,0,3]
i = 2 -> indptr[2] = 3, indptr[2 + 1] = 6 -> data[3:6] = 4,5,6 and indices[3:6] = 4,5,6 -> third row of matrix is [4,5,6]
Thanks for reading. Welcome to comment if you have doubts.