[dataset]

Dataset-interface API and actual implementations module.

A 'Dataset' is an interface towards data that exposes methods to query batched data samples and key metadata while remaining agnostic of the way the data is actually being loaded (from a source file, a database, another API...).

This declearn submodule provides with:

API tools

Dataset: Abstract base class defining an API to access training or testing data.
DataSpec: Dataclass to wrap a dataset's metadata.
load_dataset_from_json DEPRECATED Utility function to parse a JSON into a dataset object.

Dataset subclasses

InMemoryDataset: Dataset subclass serving numpy(-like) memory-loaded data arrays.

Manual-import submodules

The following submodules are to be manually imported, as they rely on optional dependencies that may be absent and/or costly to import:

tensorflow: TensorFlow-specific submodule, providing with TensorflowDataset.
torch: Torch-specific submodule, providing with TorchDataset.

Utility submodules

examples: Utils to fetch and prepare some open-source datasets.
utils: Utils to manipulate datasets (load, save, split...).

Utility entry-point

split_data Utility to split a single dataset into shards. This function builds on more unitary utils, and is installed as a command-line entry-point together with declearn.