declearn v2.3.0

Released: 30/08/2023

Release highlights

New Dataset subclasses to interface TensorFlow and Torch dataset APIs

The most visible addition of v2.3 are the new TensorflowDataset and TorchDataset classes, that respectively enable wrapping up torch.utils.data.Dataset and tensorflow.data.Dataset objects into declearn Dataset instances that can be used for training and evaluating models in a federative way.

Both of these classes are implemented under manual-import submodules of declearn.dataset: declearn.dataset.tensorflow and declearn.dataset.torch. While applications that rely on memory-fitting tabular data can still use the good old InMemoryDataset, these new interfaces are designed to enable users to re-use existing code for interfacing any kind of data, including images or text (thay may require framework-provided pre-processing), that may be loaded on-demand from a database or distributed files, or even generated procedurally.

Our effort has been put on keeping the declearn-side code minimal and to try to leave the door open for as much framework-provided features as possible, but it is possible that we have missed some things; if you run into issues or limits when using these new classes, feel free to drop us a message, using either the historical Inria-Gitlab repository or the newly-created mirroring GitHub one!

Support for Torch 2.0

Another less-visible but possibly high-impact update is the addition of support for Torch 2.0. It took us a bit of time to adjust the backend code for this new release of Torch as all of the DP-oriented functorch-based code has been made incompatible, but we are now able to provide end-users with compatibility for both the newest 2.0 version and the previously-supported 1.10-1.13 versions. Cherry on top, it should even be possible to have the server and clients use different Torch major versions!

The main interest of this new support (apart from not losing pace with the framework and its backend improvements) is to enable end-users to use the new torch.compile feature to optimize their model's runtime. There is however a major caveat to this: at the moment, options to torch.compile are lost, which means that they cannot yet be properly-propagated to clients, making this new feature usable only with default arguments. However, the Torch team is working on improving that (see for example this issue), and we will hopefully be able to forward model-compilation instructions as part of declearn in the near future!

In the meanwhile, if you encounter any issues with Torch support, notably as to 2.0-introduced features, please let us know, as we are eager to build on user feedback to improve the package's backend as well as its APIs.

Numerous test-driven backend fixes

Finally, a lot of effort has been put in making declearn more robust, by adding more unit and integration tests, improving our CI/CD setup to cover our code more extensively (notably systematically testing it on both CPU and GPU) and efficiently, and adding a custom script to launch groups of tests in a verbose and compact way. We thereof conducted a number of test-driven backend patches.

Some bugs were pretty awful and well-hidden (we recently backported a couple of hopefully-unused operations' formula fix to all previous versions via sub-minor version releases); some were visible but harmful (some metrics' computations were just plain wrong under certain input shapes conditions, which showed as values were uncanny, but made results' analysis and use a burden); some were minor and/or edge-case but still worth fixing.

We hope that this effort enabled catching most if not all current potential bugs, but will keep on improving unit tests coverage in the near future, and are adopting a stricter policy as to testing new features as they are being implemented.

List of changes

New features

Add declearn.dataset.torch.TorchDataset (!47 and !53)
Enable wrapping up torch.utils.data.Dataset instances.
Enable setting up a custom collate function for batching.
Expose the collate_with_padding util for padded-batching.
Add declearn.dataset.tensorflow.TensorflowDataset (!53)
Enable wrapping up tensorflow.data.Dataset instances.
Enable batching inputs into padded or ragged tensors.
Add support for Torch 2.0. (!49)
Add backend compatibility with Torch 2.0 while preserving existing support for versions 1.10 to 1.13.
Enable the use of torch.compile on clients' side based on its use on the server side, with some caveats that are due to Torch and on its roadmad.
Add "torch1" and "torch2" extra-dependency specifiers to ease installation of compatible versions of packages from the Torch ecosystem.
Have both versions be tested as part of our CI/CD.
Add dtype argument to SklearnSGDModel. (!50)
Add a dtype parameter to SklearnSGDModel and use it to prevent dtype issues related to the introduction of non-float64 support for SGD models as part of Scikit-learn 1.3.0.
A patch was back-ported to previous declearn relases to force conversion to float64 as part of backend computations.
Add declearn.optimizer.modules.L2GlobalClipping. (!56)
This new OptiModule enables clipping gradients based on the L2-norm of all of their concatenated values, rather than their weight-wise L2-norm (as is done by the pre-existing L2Clipping module).
Add replacement: bool = False argument to Dataset.generate_batches (!47 and !53).
Enable drawing fixed-size-batches of samples with replacement.
Note that this is not yet available as part of the FL process, as sending backward-incompatible keyword arguments would break compatibility between any v2.3+ server and v2.0-2.2 client.
This new option will therefore be deployed no sooner than in declearn 3.0 (where additional mechanisms are bound to be designed to anticipate this kind of changes and make it so that older-version clients can dismiss the unsupported arguments and/or be prohibited to join a process that requires their use).

Revisions

Fix build_keras_loss. (!50)
Fix build_keras_loss util, that caused an incompatibility with latest TensorFlow versions in some cases.
Fix Vector.__rtruediv__ and L2Clipping (!54)
Fix Vector.__rtruediv__ formula, for scalar / vector operations.
Fix L2Clipping L2-norm-based clipping.
Both these fixes were backported to previous releases, as sub-minor version releases.
Fix exanded-dim inputs handling in MAE, MSE and R2 metrics (!57)
Fix those metrics' computation when true and predicted labels have the same shape up to one expanded dimension (typically for single-target regression or single-label binary classification tasks using a neural network without further flattening beyond the output layer).
Miscellaneous minor fixes (!57)
Fix 'DataTypeField.is_valid'.
Fix 'InMemoryDataset' single-column target loading from csv.
Fix 'InMemoryDataset.data_type'.
Fix 'EarlyStopping.update' with repeated equal inputs.

Deprecations

Remove declearn.dataset.Dataset.load_from_json and save_to_json from the API-defining API. (!47)
These methods were not used anywhere in declearn, and unlikely to be used in code that did not specifically use InMemoryDataset (the methods of which are kept).
Deprecate Vector subclasses' keyword arguments to the sum method. (!55)
Vector.sum does not provide with kwargs, but subclasses do, for no good reason as they are never used in declearn.

Documentation & Examples

Clean up the non-quickrun MNIST example. (!51)
Update documentation on the CI/CD and the way(s) to run tests.

Unit and integration tests

Revise toy-regression integration tests for efficiency and coverage. (!57)
Add proper unit tests for Vector and its subclasses. (!55)
Add tests for 'declearn.data_info' submodule. (!57)
Add tests for 'declearn.dataset.InMemoryDataset'. (!47 and !57)
Add tests for 'declearn.main.utils.aggregate_clients_data_info'. (!57)
Add tests for 'declearn.main.utils.EarlyStopping'. (!57)
Add tests for large (chunked) messages' exchange over network. (!57)
Add tests for exanded-dim inputs in MAE, MSE and R2 metrics. (!57)
Add tests for some of the 'declearn.quickrun' backend utils. (!57)