declearn.dataset.Dataset
Abstract class defining an API to access training or testing data.
A 'Dataset' is an interface towards data that exposes methods to query batched data samples and key metadata while remaining agnostic of the way the data is actually being loaded (from a source file, a database, a network reader, another API...).
This is notably done to allow clients to use distinct data storage and loading architectures, even implementing their own subclass if needed, while ensuring that data access is straightforward to specify as part of FL algorithms.
Source code in declearn/dataset/_base.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
generate_batches(batch_size, shuffle=False, drop_remainder=True, replacement=False, poisson=False)
abstractmethod
Yield batches of data samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size |
int
|
Number of samples per batch. |
required |
shuffle |
bool
|
Whether to shuffle data samples prior to batching. Note that the shuffling will differ on each call to this method. |
False
|
drop_remainder |
bool
|
Whether to drop the last batch if it contains less
samples than |
True
|
replacement |
bool
|
Whether to do random sampling with or without replacement.
Ignored if |
False
|
poisson |
bool
|
Whether to use Poisson sampling, i.e. make up batches by drawing samples with replacement, resulting in variable- size batches and samples possibly appearing in zero or in multiple emitted batches (but at most once per batch). Useful to maintain tight Differential Privacy guarantees. |
False
|
Yields:
Name | Type | Description |
---|---|---|
inputs |
(2
|
Input features of that batch. |
targets |
data array, list of data arrays or None
|
Target labels or values of that batch. May be None for unsupervised or semi-supervised tasks. |
weights |
1-d data array or None
|
Optional weights associated with the samples, that are typically used to balance a model's loss or metrics. |
Source code in declearn/dataset/_base.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
get_data_specs()
abstractmethod
Return a DataSpecs object describing this dataset.
Source code in declearn/dataset/_base.py
61 62 63 64 65 |
|