Skip to content

declearn.dataset.utils.load_data_array

Load a data array from a dump file.

Supported file extensions

  • .csv: csv file, comma-delimited by default. Any keyword arguments to pandas.read_csv may be passed.
  • .npy: Non-pickle numpy array dump, created with numpy.save.
  • .sparse: Scipy sparse matrix dump, created with the custom declearn.data.sparse.sparse_to_file function.
  • .svmlight: SVMlight sparse matrix and labels array dump. Parse using sklearn.load_svmlight_file, and return either features or labels based on the which int keyword argument (default: 0, for features).

Parameters:

Name Type Description Default
path str

Path to the data array dump file. Extension must be adequate to enable proper parsing; see list of supported extensions above.

required
**kwargs Any

Extension-type-based keyword parameters may be passed. See above for details.

{}

Returns:

Name Type Description
data numpy.ndarray or pandas.DataFrame or scipy.spmatrix

Reloaded data array.

Raises:

Type Description
TypeError

If path is of unsupported extension.

Any exception raised by data-loading functions may also be raised (e.g. if the file cannot be proprely parsed).

Source code in declearn/dataset/utils/_save_load.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def load_data_array(
    path: str,
    **kwargs: Any,
) -> DataArray:
    """Load a data array from a dump file.

    Supported file extensions
    -------------------------
    - `.csv`:
        csv file, comma-delimited by default.
        Any keyword arguments to `pandas.read_csv` may be passed.
    - `.npy`:
        Non-pickle numpy array dump, created with `numpy.save`.
    - `.sparse`:
        Scipy sparse matrix dump, created with the custom
        `declearn.data.sparse.sparse_to_file` function.
    - `.svmlight`:
        SVMlight sparse matrix and labels array dump.
        Parse using `sklearn.load_svmlight_file`, and
        return either features or labels based on the
        `which` int keyword argument (default: 0, for
        features).

    Parameters
    ----------
    path: str
        Path to the data array dump file.
        Extension must be adequate to enable proper parsing;
        see list of supported extensions above.
    **kwargs:
        Extension-type-based keyword parameters may be passed.
        See above for details.

    Returns
    -------
    data: numpy.ndarray or pandas.DataFrame or scipy.spmatrix
        Reloaded data array.

    Raises
    ------
    TypeError
        If `path` is of unsupported extension.

    Any exception raised by data-loading functions may also be raised
    (e.g. if the file cannot be proprely parsed).
    """
    ext = os.path.splitext(path)[1]
    if ext == ".csv":
        return pd.read_csv(path, **kwargs)
    if ext == ".npy":
        return np.load(path, allow_pickle=False)
    if ext == ".sparse":
        return sparse_from_file(path)
    if ext == ".svmlight":
        which = kwargs.get("which", 0)
        return load_svmlight_file(path)[which]
    raise TypeError(f"Unsupported data array file extension: '{ext}'.")