[dataset]
Dataset-interface API and actual implementations module.
A 'Dataset' is an interface towards data that exposes methods to query batched data samples and key metadata while remaining agnostic of the way the data is actually being loaded (from a source file, a database, another API...).
This declearn submodule provides with:
API tools
- Dataset: Abstract base class defining an API to access training or testing data.
- DataSpec: Dataclass to wrap a dataset's metadata.
- load_dataset_from_json DEPRECATED Utility function to parse a JSON into a dataset object.
Dataset subclasses
- InMemoryDataset: Dataset subclass serving numpy(-like) memory-loaded data arrays.
Manual-import submodules
The following submodules are to be manually imported, as they rely on optional dependencies that may be absent and/or costly to import:
- tensorflow: TensorFlow-specific submodule, providing with TensorflowDataset.
- torch: Torch-specific submodule, providing with TorchDataset.
Utility submodules
- examples: Utils to fetch and prepare some open-source datasets.
- utils: Utils to manipulate datasets (load, save, split...).
Utility entry-point
- split_data Utility to split a single dataset into shards. This function builds on more unitary utils, and is installed as a command-line entry-point together with declearn.