Skip to content

declearn.dataset.utils.sparse_to_file

Dump a scipy sparse matrix as a text file.

See the sparse_from_file counterpart function to reload the dumped data from the created file.

Parameters:

Name Type Description Default
path str

Path to the file where to store the sparse matrix. If the path does not end with a '.sparse' extension, one will be added automatically.

required
matrix spmatrix

Sparse matrix that needs storing to file.

required

Raises:

Type Description
TypeError

If 'matrix' is of unsupported type, i.e. not a BSR, CSC, CSR, COO, DIA, DOK or LIL sparse matrix.

Note

The format used is mostly similar to the SVMlight one (see for example sklearn.datasets.dump_svmlight_file), but enables storing a single matrix rather than a (X, y) pair of arrays. It also records the input matrix's dtype and type of sparse format, which are restored when the counterpart sparse_from_file function is used to reload it.

Source code in declearn/dataset/utils/_sparse.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def sparse_to_file(
    path: str,
    matrix: spmatrix,
) -> None:
    """Dump a scipy sparse matrix as a text file.

    See the [`sparse_from_file`][declearn.dataset.utils.sparse_from_file]
    counterpart function to reload the dumped data from the created file.

    Parameters
    ----------
    path: str
        Path to the file where to store the sparse matrix.
        If the path does not end with a '.sparse' extension,
        one will be added automatically.
    matrix: scipy.sparse.spmatrix
        Sparse matrix that needs storing to file.

    Raises
    ------
    TypeError
        If 'matrix' is of unsupported type, i.e. not a BSR,
        CSC, CSR, COO, DIA, DOK or LIL sparse matrix.

    Note
    ----
    The format used is mostly similar to the SVMlight one (see for example
    `sklearn.datasets.dump_svmlight_file`), but enables storing a single
    matrix rather than a (X, y) pair of arrays. It also records the input
    matrix's dtype and type of sparse format, which are restored when the
    counterpart `sparse_from_file` function is used to reload it.
    """
    if os.path.splitext(path)[1] != ".sparse":
        path += ".sparse"
    # Identify the type of sparse matrix, and convert it to lil.
    name = SPARSE_TYPES.get(type(matrix))
    if name is None:
        raise TypeError(f"Unsupported sparse matrix type: '{type(matrix)}'.")
    lil = matrix.tolil()
    # Record key metadata required to rebuild the matrix.
    meta = {
        "stype": name,
        "dtype": lil.dtype.name,
        "shape": lil.shape,
    }
    # Write data to the target file.
    os.makedirs(os.path.dirname(os.path.abspath(path)), exist_ok=True)
    with open(path, "w", encoding="utf-8") as file:
        file.write(json.dumps(meta))
        for ind, val in zip(lil.rows, lil.data):
            row = " ".join(f"{i}:{v}" for i, v in zip(ind, val))
            file.write("\n" + row)