declearn v2.2.0

Released: 11/05/2023

Release highlights

Declearn Quickrun Mode & Dataset-splitting utils

The two most-visible additions of v2.2 are the declearn-quickrun and declearn-split entry-point scripts, that are installed as CLI tools together with the package when running pip install declearn (or installing from source).

declearn-quickrun introduces an alternative way to use declearn so as to run a simulated Federated Learning experiment on a single computer, using localhost communications, and any model, dataset and optimization / training / evaluation configuration.

declearn-quickrun relies on:

a python code file to specify the model;
a standard (but partly modular) data storage structure;
a TOML config file to specify everything else.

It is thought of as:

a simple entry-point to newcomers, demonstrating what declearn can do with zero to minimal knowledge of the actual Python API;
a nice way to run experiments for research purposes, with minimal setup (and the possibility to maintain multiple experiment configurations in parallel via named and/or versioned TOML config files) and standardized outputs (including model weights, full process logs and evaluation metrics).

declearn-split is a CLI tool that wraps up some otherwise-public data utils that enable splitting and preparing a supervised learning dataset for its use in a Federated Learning experiment. It is thought of as a helper to prepare data for its use with declearn-quickrun.

Support for Jax / Haiku

Another visible addition of declearn v2.2 is the support for models implemented in Jax, specifically via the neural network library Haiku.

This takes shape of the new (optional) declearn.model.haiku submodule, that provides with dedicated JaxNumpyVector and HaikuModel classes (subclassing the base Vector and Model ones). Existing unit and integration tests have been extended to cover this new framework (when available), which is therefore usable on par with Scikit-Learn, TensorFlow and Torch - up to a few framework specificities in the setup of the model, notably when it is desired to freeze some layers (which has to happen after instantiating and initializing the model, contrary to what can be done in other nerual network frameworks).

Improved Documentation and Examples

Finally, this new version comes with an effort on improving the usability of the package, notably via the readability of its documentation and examples.

The documentation has been heavily-revised (which has already been partially back-ported to previous version releases upon making the documentation website public).

The legacy Heart UCI example has been improved to enable real-life execution (i.e. using multiple agents / computers communicating over the internet). More importantly, the classic MNIST dataset has been used to implement simpler and more-diverse introductory examples, that demonstrate the various flavors of declearn one can look for (including the new Quickrun mode).

The declearn.dataset.examples submodule has been introduced, so that example data loaders can be added (and maintained / tested) as part of the package. For now these utils only cover the MNIST and Heart UCI datasets, but more reference datasets are expected to be added in the future, enabling end-users to make up their own experiments and toy around the packages' functionality in no time.

List of changes

New features

Add declearn.model.haiku submodule. (!32)
- Implement Vector and Model subclasses to interface Haiku/Jax-backed models.
- The associated dependencies (jax and haiku) may be installed using pip install declearn[haiku] or pip install declearn[all], and remain optional.
- Note that both Haiku and Jax are early-development products: as such, the supported versions are hard-coded for now, due to the lack of API stability.
Add declearn-quickrun entry point. (!41)
- Implement declearn-quickrun as a CLI to run simulated FL experiments.
- Write some dedicated TOML parsers to set up the entire process from a single configuration file (building on existing declearn.main.config tools), and build on the file format output by declearn-split (see below).
- Revise TomlConfig make run_as_processes public (see below).
Add declearn-split entry point. (!41)
- Add some dataset utility functions (see below).
- Implement declearn-split to interface data-splitting utils as a CLI.
Add declearn.dataset.examples submodule. (!41)
- Add MNIST dataset downloading utils.
- Add Heart UCI dataset downloading utils.
Add declearn.dataset.utils submodule. (!41)
- Add split_multi_classif_dataset for multinomial classification data.
- Refactor some declearn.dataset.InMemoryDataset code into functional utils: save_data_array and load_data_array.
- Expose sparse matrices' to-/from-file parsing utils.
Add the run_as_processes utility.
- Revise the util to capture exceptions and outputs. (!37)
- Make the util public as part of the declearn quickrun addition. (!41)
Add data_type and features_shape to DataSpecs. (!36)
- These fields enable specifying input features' shape and dtype.
- The input_shape and nb_features fields have in turn been deprecated (see section below).
Add utils to access types mapping of optimization plug-ins. (!44)
- Add declearn.aggregator.list_aggregators.
- Add declearn.optimizer.list_optim_modules.
- Add declearn.optimizer.list_optim_regularizers.
- All three of these utils are trivial, but are expected to be easier to find out about and use by end-users than their more generic backend counterpart declearn.utils.access_types_mapping(group="...").

Revisions

Refactor TorchModel backend code to clip gradients. (!42)
- Optimize functorch code when possible (essentially, for Torch 1.13).
- Pave the way towards a future transition to Torch 2.0.
Revise TomlConfig parameters and backend code
- Add options to target a subsection of a TOML file. (!41)
- Improve the default parser (!44)
Revise type annotations of Model and Vector. (!44)
- Use typing.Generic and typing.TypeVar to improve the annotations about wrapped-data / used-vectors coherence in these classes, and in Optimizer and associated plug-in classes.

Deprecations

Deprecate declearn.dataset.InMemoryDataset.(load|save)_data_array. (!41)
- Replaced with declearn.dataset.utils.(load|save)_data_array.
- The deprecated functions now call the former, emitting a warning.
- They will be removed in v2.4 and/or v3.0.
Deprecate declearn.data_info.InputShapeField and NbFeaturesField. (!36)
- Replaced with declearn.dataset.FeaturesShapeField.
- The deprecated fields may still be used, but emit a warning.
- They will be removed in v2.4 and/or v3.0.

Documentation & Examples

Restructure the documentation and render it as a website. (!40)
- Restructure the overly-long readme file into a set of guides.
- Set up the automatic rendering of the API reference from the code.
- Publish the docs as a versioned website: https://magnet.gitlabpages.inria.fr/declearn/docs
- Backport these changes so that the website covers previous releases.
Provide with a Quickstart example using declearn-quickrun.
- Replace the Quickstart guide with an expanded one providing with a fully- functioning example that uses the MNIST dataset (see below).
- Use this guide to showcase the various use-cases of declearn (simulated FL or real-life deployment / TOML config or python scripts).
Modularize the Heart UCI example for its real-life deployment. (!34)
Implement the MNIST example, in three flavors. (!41)
- Make MNIST the default demonstration example for the declearn-quickrun and declearn-split CLI tools.
- Write a MNIST example using the Quickrun mode with a customizable config.
- Write a MNIST example as a set of python files, enabling real-life use.

Unit and integration tests

Compute code coverage as part of CI/CD pipelines. (!38)
Replace declearn.communication unit tests. (!39)
Modularize test_regression integration tests. (!39)
Add the optional '--cpu-only' flag for unit tests. (!39)
Add unit tests for declearn.dataset.examples. (!41)
Add unit tests for declearn.dataset.utils. (!41)
Add unit tests for declearn.utils.TomlConfig. (!44)
Add unit tests for declearn.aggregator.Aggregator classes. (!44)
Extend unit tests for type-registration utils. (!44)