`declearn.dataset.utils.sparse_from_file`

Return a scipy sparse matrix loaded from a text file.

See the sparse_to_file counterpart function to create reloadable sparse data dump files.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the sparse matrix dump file.	required

Returns:

Name	Type	Description
`matrix`	`scipy.sparse.spmatrix`	Sparse matrix restored from file, the exact type of which being defined by said file.

Raises:

Type	Description
`KeyError`	If the file's header cannot be JSON-parsed or does not conform to the expected standard.
`TypeError`	If the documented sparse matrix type is not supported, i.e. "bsr", "csv", "csc", "coo", "dia", "dok" or "lil".

Note

The format used is mostly similar to the SVMlight one (see for example sklearn.datasets.load_svmlight_file), but the file must store a single matrix rather than a (X, y) pair of arrays. It must also record some metadata in its header, which are notably used to restore the initial matrix's dtype and type of sparse format.

Source code in declearn/dataset/utils/_sparse.py

def sparse_from_file(path: str) -> spmatrix:
    """Return a scipy sparse matrix loaded from a text file.

    See the [`sparse_to_file`][declearn.dataset.utils.sparse_to_file]
    counterpart function to create reloadable sparse data dump files.

    Parameters
    ----------
    path: str
        Path to the sparse matrix dump file.

    Returns
    -------
    matrix: scipy.sparse.spmatrix
        Sparse matrix restored from file, the exact type
        of which being defined by said file.

    Raises
    ------
    KeyError
        If the file's header cannot be JSON-parsed or does not
        conform to the expected standard.
    TypeError
        If the documented sparse matrix type is not supported,
        i.e. "bsr", "csv", "csc", "coo", "dia", "dok" or "lil".


    Note
    ----
    The format used is mostly similar to the SVMlight one (see for example
    `sklearn.datasets.load_svmlight_file`), but the file must store a single
    matrix rather than a (X, y) pair of arrays. It must also record some
    metadata in its header, which are notably used to restore the initial
    matrix's dtype and type of sparse format.
    """
    with open(path, "r", encoding="utf-8") as file:
        # Read and parse the file's header.
        try:
            head = json.loads(file.readline())
        except json.JSONDecodeError as exc:
            raise KeyError("Invalid header for sparse matrix file.") from exc
        if any(key not in head for key in ("stype", "dtype", "shape")):
            raise KeyError("Invalid header for sparse matrix file.")
        if head["stype"] not in SPARSE_TYPES.values():
            raise TypeError(f"Invalid sparse matrix type: '{head['stype']}'.")
        # Instantiate a lil_matrix abiding by the header's specs.
        lil = lil_matrix(tuple(head["shape"]), dtype=head["dtype"])
        cnv = int if lil.dtype.kind == "i" else float
        # Iteratively parse and fill-in row data.
        for rix, row in enumerate(file):
            row = row.strip(" \n")
            if not row:  # all-zeros row
                continue
            for field in row.split(" "):
                ind, val = field.split(":")
                lil[rix, int(ind)] = cnv(val)
    # Convert the matrix to its initial format and return.
    return getattr(lil, f"to{head['stype']}")()