declearn v2.2.0
Released: 11/05/2023
Release highlights
Declearn Quickrun Mode & Dataset-splitting utils
The two most-visible additions of v2.2 are the declearn-quickrun
and
declearn-split
entry-point scripts, that are installed as CLI tools together
with the package when running pip install declearn
(or installing from
source).
declearn-quickrun
introduces an alternative way to use declearn so as to run
a simulated Federated Learning experiment on a single computer, using localhost
communications, and any model, dataset and optimization / training / evaluation
configuration.
declearn-quickrun
relies on:
- a python code file to specify the model;
- a standard (but partly modular) data storage structure;
- a TOML config file to specify everything else.
It is thought of as:
- a simple entry-point to newcomers, demonstrating what declearn can do with zero to minimal knowledge of the actual Python API;
- a nice way to run experiments for research purposes, with minimal setup (and the possibility to maintain multiple experiment configurations in parallel via named and/or versioned TOML config files) and standardized outputs (including model weights, full process logs and evaluation metrics).
declearn-split
is a CLI tool that wraps up some otherwise-public data utils
that enable splitting and preparing a supervised learning dataset for its use
in a Federated Learning experiment. It is thought of as a helper to prepare
data for its use with declearn-quickrun
.
Support for Jax / Haiku
Another visible addition of declearn v2.2 is the support for models implemented in Jax, specifically via the neural network library Haiku.
This takes shape of the new (optional) declearn.model.haiku
submodule, that
provides with dedicated JaxNumpyVector
and HaikuModel
classes (subclassing
the base Vector
and Model
ones). Existing unit and integration tests have
been extended to cover this new framework (when available), which is therefore
usable on par with Scikit-Learn, TensorFlow and Torch - up to a few framework
specificities in the setup of the model, notably when it is desired to freeze
some layers (which has to happen after instantiating and initializing the
model, contrary to what can be done in other nerual network frameworks).
Improved Documentation and Examples
Finally, this new version comes with an effort on improving the usability of the package, notably via the readability of its documentation and examples.
The documentation has been heavily-revised (which has already been partially back-ported to previous version releases upon making the documentation website public).
The legacy Heart UCI example has been improved to enable real-life execution (i.e. using multiple agents / computers communicating over the internet). More importantly, the classic MNIST dataset has been used to implement simpler and more-diverse introductory examples, that demonstrate the various flavors of declearn one can look for (including the new Quickrun mode).
The declearn.dataset.examples
submodule has been introduced, so that example
data loaders can be added (and maintained / tested) as part of the package. For
now these utils only cover the MNIST and Heart UCI datasets, but more reference
datasets are expected to be added in the future, enabling end-users to make up
their own experiments and toy around the packages' functionality in no time.
List of changes
New features
-
Add
declearn.model.haiku
submodule. (!32)- Implement
Vector
andModel
subclasses to interface Haiku/Jax-backed models. - The associated dependencies (jax and haiku) may be installed using
pip install declearn[haiku]
orpip install declearn[all]
, and remain optional. - Note that both Haiku and Jax are early-development products: as such, the supported versions are hard-coded for now, due to the lack of API stability.
- Implement
-
Add
declearn-quickrun
entry point. (!41)- Implement
declearn-quickrun
as a CLI to run simulated FL experiments. - Write some dedicated TOML parsers to set up the entire process from a
single configuration file (building on existing
declearn.main.config
tools), and build on the file format output bydeclearn-split
(see below). - Revise
TomlConfig
makerun_as_processes
public (see below).
- Implement
-
Add
declearn-split
entry point. (!41)- Add some dataset utility functions (see below).
- Implement
declearn-split
to interface data-splitting utils as a CLI.
-
Add
declearn.dataset.examples
submodule. (!41)- Add MNIST dataset downloading utils.
- Add Heart UCI dataset downloading utils.
-
Add
declearn.dataset.utils
submodule. (!41)- Add
split_multi_classif_dataset
for multinomial classification data. - Refactor some
declearn.dataset.InMemoryDataset
code into functional utils:save_data_array
andload_data_array
. - Expose sparse matrices' to-/from-file parsing utils.
- Add
-
Add the
run_as_processes
utility.- Revise the util to capture exceptions and outputs. (!37)
- Make the util public as part of the declearn quickrun addition. (!41)
-
Add
data_type
andfeatures_shape
toDataSpecs
. (!36)- These fields enable specifying input features' shape and dtype.
- The
input_shape
andnb_features
fields have in turn been deprecated (see section below).
-
Add utils to access types mapping of optimization plug-ins. (!44)
- Add
declearn.aggregator.list_aggregators
. - Add
declearn.optimizer.list_optim_modules
. - Add
declearn.optimizer.list_optim_regularizers
. - All three of these utils are trivial, but are expected to be easier
to find out about and use by end-users than their more generic backend
counterpart
declearn.utils.access_types_mapping(group="...")
.
- Add
Revisions
-
Refactor
TorchModel
backend code to clip gradients. (!42)- Optimize functorch code when possible (essentially, for Torch 1.13).
- Pave the way towards a future transition to Torch 2.0.
-
Revise
TomlConfig
parameters and backend code- Add options to target a subsection of a TOML file. (!41)
- Improve the default parser (!44)
-
Revise type annotations of
Model
andVector
. (!44)- Use
typing.Generic
andtyping.TypeVar
to improve the annotations about wrapped-data / used-vectors coherence in these classes, and inOptimizer
and associated plug-in classes.
- Use
Deprecations
-
Deprecate
declearn.dataset.InMemoryDataset.(load|save)_data_array
. (!41)- Replaced with
declearn.dataset.utils.(load|save)_data_array
. - The deprecated functions now call the former, emitting a warning.
- They will be removed in v2.4 and/or v3.0.
- Replaced with
-
Deprecate
declearn.data_info.InputShapeField
andNbFeaturesField
. (!36)- Replaced with
declearn.dataset.FeaturesShapeField
. - The deprecated fields may still be used, but emit a warning.
- They will be removed in v2.4 and/or v3.0.
- Replaced with
Documentation & Examples
-
Restructure the documentation and render it as a website. (!40)
- Restructure the overly-long readme file into a set of guides.
- Set up the automatic rendering of the API reference from the code.
- Publish the docs as a versioned website: https://magnet.gitlabpages.inria.fr/declearn/docs
- Backport these changes so that the website covers previous releases.
-
Provide with a Quickstart example using
declearn-quickrun
.- Replace the Quickstart guide with an expanded one providing with a fully- functioning example that uses the MNIST dataset (see below).
- Use this guide to showcase the various use-cases of declearn (simulated FL or real-life deployment / TOML config or python scripts).
-
Modularize the Heart UCI example for its real-life deployment. (!34)
-
Implement the MNIST example, in three flavors. (!41)
- Make MNIST the default demonstration example for the
declearn-quickrun
anddeclearn-split
CLI tools. - Write a MNIST example using the Quickrun mode with a customizable config.
- Write a MNIST example as a set of python files, enabling real-life use.
- Make MNIST the default demonstration example for the
Unit and integration tests
- Compute code coverage as part of CI/CD pipelines. (!38)
- Replace
declearn.communication
unit tests. (!39) - Modularize
test_regression
integration tests. (!39) - Add the optional '--cpu-only' flag for unit tests. (!39)
- Add unit tests for
declearn.dataset.examples
. (!41) - Add unit tests for
declearn.dataset.utils
. (!41) - Add unit tests for
declearn.utils.TomlConfig
. (!44) - Add unit tests for
declearn.aggregator.Aggregator
classes. (!44) - Extend unit tests for type-registration utils. (!44)