Skip to content

[dataset]

Dataset-interface API and actual implementations module.

A 'Dataset' is an interface towards data that exposes methods to query batched data samples and key metadata while remaining agnostic of the way the data is actually being loaded (from a source file, a database, another API...).

This declearn submodule provides with:

API tools

  • Dataset: Abstract base class defining an API to access training or testing data.
  • DataSpec: Dataclass to wrap a dataset's metadata.
  • load_dataset_from_json DEPRECATED Utility function to parse a JSON into a dataset object.

Dataset subclasses

  • InMemoryDataset: Dataset subclass serving numpy(-like) memory-loaded data arrays.

Manual-import submodules

The following submodules are to be manually imported, as they rely on optional dependencies that may be absent and/or costly to import:

Utility submodules

  • examples: Utils to fetch and prepare some open-source datasets.
  • utils: Utils to manipulate datasets (load, save, split...).

Utility entry-point

  • split_data Utility to split a single dataset into shards. This function builds on more unitary utils, and is installed as a command-line entry-point together with declearn.