Hands-on usage
Here are details on how to set up server-side and client-side programs that will run together to perform a federated learning process. Generic remarks from the Quickstart section hold here as well, the former section being an overly simple exemplification of the present one.
You can follow along on a concrete example that uses the UCI heart disease
dataset, that is stored in the examples/uci-heart
folder. You may refer
to the server.py
and client.py
example scripts, that comprise comments
indicating how the code relates to the steps described below. For further
details on this example and on how to run it, please refer to its own
readme.md
file.
Server setup instructions
1. Define a Model
- Set up a machine learning model in a given framework
(e.g. a
torch.nn.Module
). - Select the appropriate
declearn.model.api.Model
subclass to wrap it up. - Either instantiate the
Model
or provide a JSON-serialized configuration.
2. Define a FLOptimConfig
- Select a
declearn.aggregator.Aggregator
(subclass) instance to define how clients' updates are to be aggregated into global-model updates on the server side. - Parameterize a
declearn.optimizer.Optimizer
(possibly using a selected pipeline ofdeclearn.optimizer.modules.OptiModule
plug-ins and/or a pipeline ofdeclearn.optimizer.regularizers.Regularizer
ones) to be used by clients to derive local step-wise updates from model gradients. - Similarly, parameterize an
Optimizer
to be used by the server to (optionally) refine the aggregated model updates before applying them. - Wrap these three objects into a
declearn.main.config.FLOptimConfig
, possibly using itsfrom_config
method to specify the former three components via configuration dicts rather than actual instances. - Alternatively, write up a TOML configuration file that specifies these components (note that 'aggregator' and 'server_opt' have default values and may therefore be left unspecified).
3. Define a communication server endpoint
- Select a communication protocol (e.g. "grpc" or "websockets").
- Select the host address and port to use.
- Preferably provide paths to PEM files storing SSL-required information.
- Wrap this into a config dict or use
declearn.communication.build_server
to instantiate adeclearn.communication.api.NetworkServer
to be used.
4. Instantiate and run a FederatedServer
- Instantiate a
declearn.main.FederatedServer
:- Provide the Model, FLOptimConfig and Server objects or configurations.
- Optionally provide a MetricSet object or its specs (i.e. a list of Metric instances, identifier names of (name, config) tuples), that defines metrics to be computed by clients on their validation data.
- Optionally provide the path to a folder where to write output files (model checkpoints and global loss history).
- Instantiate a
declearn.main.config.FLRunConfig
to specify the process:- Maximum number of training and evaluation rounds to run.
- Registration parameters: exact or min/max number of clients to have and optional timeout delay spent waiting for said clients to join.
- Training parameters: data-batching parameters and effort constraints (number of local epochs and/or steps to take, and optional timeout).
- Evaluation parameters: data-batching parameters and effort constraints (optional maximum number of steps (<=1 epoch) and optional timeout).
- Early-stopping parameters (optionally): patience, tolerance, etc. as to the global model loss's evolution throughout rounds.
- Local Differential-Privacy parameters (optionally): (epsilon, delta) budget, type of accountant, clipping norm threshold, RNG parameters.
- Alternatively, write up a TOML configuration file that specifies all of the former hyper-parameters.
- Call the server's
run
method, passing it the former config object, the path to the TOML configuration file, or dictionaries of keyword arguments to be parsed into aFLRunConfig
instance.
Clients setup instructions
1. Interface training data
- Select and parameterize a
declearn.dataset.Dataset
subclass that will interface the local training dataset. - Ensure its
get_data_specs
method exposes the metadata that is to be shared with the server (and nothing else, to prevent data leak).
2. Interface validation data (optional)
- Optionally set up a second Dataset interfacing a validation dataset, used in evaluation rounds. Otherwise, those rounds will be run using the training dataset - which can be slow and/or lead to overfitting.
3. Define a communication client endpoint
- Select the communication protocol used (e.g. "grpc" or "websockets").
- Provide the server URI to connect to.
- Preferable provide the path to a PEM file storing SSL-required information (matching those used on the Server side).
- Wrap this into a config dict or use
declearn.communication.build_client
to instantiate adeclearn.communication.api.NetworkClient
to be used.
4. Run any necessary import statement
- If optional or third-party dependencies are known to be required, import
them (e.g.
import declearn.model.torch
). - Read more about this point below.
5. Instantiate a FederatedClient and run it
- Instantiate a
declearn.main.FederatedClient
:- Provide the NetworkClient and Dataset objects or configurations.
- Optionally specify
share_metrics=False
to prevent sharing evaluation metrics (apart from the aggregated loss) with the server out of privacy concerns. - Optionally provide the path to a folder where to write output files (model checkpoints and local loss history).
- Call the client's
run
method and let the magic happen.
Logging
Note that this section and the quickstart example both left apart the option
to configure logging associated with the federated client and server, and/or
the network communication handlers they make use of. One may simply set up
custom logging.Logger
instances and pass them as arguments to the class
constructors to replace the default, console-only, loggers.
The declearn.utils.get_logger
function may be used to facilitate the setup
of such logger instances, defining their name, verbosity level, and whether
messages should be logged to the console and/or to an output file.
Support for GPU acceleration
TL;DR: GPU acceleration is natively available in declearn
for model
frameworks that support it. It may be disabled or configured with one line
of code and without changing your original model.
Details:
Most machine learning frameworks, including Tensorflow and Torch, enable
accelerating computations by using computational devices other than CPU.
declearn
interfaces supported frameworks to be able to set a device policy
in a single line of code, accross frameworks.
declearn
internalizes the framework-specific code adaptations to place the
data, model weights and computations on such a device. declearn
provides
with a simple API to define a global device policy. This enables using a
single GPU to accelerate computations, or forcing the use of a CPU.
By default, the policy is set to use the first available GPU, and otherwise use the CPU, with a warning that can safely be ignored.
Setting the device policy to be used can be done in local scripts, either as a client or as a server. Device policy is local and is not synchronized between federated learninng participants.
Here are some examples of the one-liner used:
declearn.utils.set_device_policy(gpu=False) # disable GPU use
declearn.utils.set_device_policy(gpu=True) # use any available GPU
declearn.utils.set_device_policy(gpu=True, idx=2) # specifically use GPU n°2
Known issues:
- For Haiku / Jax, GPU support must be installed manually by end-users, as it
is dependent on your local CUDA version, and as such cannot be easily shipped
as part of declearn's dependencies specification. You most probably will need
to run
pip install jax[cudaXX_pip]==0.4
, whereXX
is either11
,12
, or your more recent CUDA version. For more details, please refer to Jax's installation instructions. - For Torch, if you have an unsupported CUDA and/or cuDNN version installed, the package may not work at all (even on CPU). This is an issue with Torch, and we advise you to report to their documentation or issue tracker if you need help fixing it - see for example their installation instructions for version 1.13.
Dependency sharing
One important issue that is not currently handled by declearn itself is that of ensuring that clients have loaded all dependencies that may be required to unpack the Model and Optimizer instances transmitted at initialization. At the moment, it is therefore left to users to agree on the dependencies that need to be imported as part of the client-side launching script.
For example, if the trained model is an artificial neural network that uses
PyTorch as implementation backend, clients will need to add the
import declearn.model.torch
statement in their code (and, obviously, to
have torch
installed). Similarly, if a custom declearn OptiModule
was
written to modify the way updates are computed locally by clients, it will
need to be shared with clients - either as a package to be imported (like
torch previously), or as a bit of source code to add on top of the script.