Quickstart

Here's where to start if you want to quickly understand what declearn does. This tutorial exepects a basic understanding of federated learning.

We show different ways to use declearn on a well-known example, the MNIST dataset (see section 1). We then look at how to use declearn on your own problem (see section 2).

1. Federated learning on the MNIST dataset

We are going to train a common model between three simulated clients on the classic MNIST dataset. The input of the model is a set of images of handwritten digits, and the model needs to determine to which number between \(0\) and \(9\) each image corresponds. We show two ways to use declearn on this problem.

1.1. Quickrun mode

The quickrun mode is the simplest way to simulate a federated learning process on a single machine with declearn. It does not require to understand the details of the declearn implementation. It requires a basic understanding of federated learning.

To test this on the MNIST example, you can follow along the jupyter notebook provided here, which we recommend running on Google Colab to skip on setting up git, python, a virtual environment, etc.

You may find a (possibly not entirely up-to-date) pre-hosted version of that notebook here.

If you want to run this locally, the detailed notebook can be boiled down to five shell commands. Set up a dedicated conda or venv environment, and run:

git clone https://gitlab.inria.fr/magnet/declearn/declearn2 &&
cd declearn2 &&
pip install .[tensorflow,websockets] &&
declearn-split --folder "examples/mnist_quickrun" &&
declearn-quickrun --config "examples/mnist_quickrun/config.toml"

To better understand the details of what happens under the hood you can look at what the key element of the declearn process are in section 1.2. To understand how to use the quickrun mode in practice, see section 2.1.

1.2. Python script

MNIST

The quickrun mode abstracts away a lot of important elements of the process, and is only designed to simulate an FL experiment: the clients all run on the same machine. In real life deployment, a declearn experiment is built in python.

To see what this looks like in practice, you can head to the all-python MNIST example examples/mnist/ in the declearn repository, which you can access here.

This version of the example may either be used to run a simulated process on a single computer, or to deploy the example over a real-life network.

Stylized structure

At a very high-level, declearn is structured around two key objects. The Clients hold the data and perform calculations locally. The Server owns the model and the global training process. They communicate over a network, the central endpoint of which is hosted by the Server.

We provide below a stylized view of the main elements of the Server and Client scripts. For more details, you can look at the hands-on usage section of the documentation.

We show what a Client and Server script can look like on a hypothetical LASSO logistic regression model, using a scikit-learn backend and pre-processed data. The data is in csv files with a "label" column, where each client has two files: one for training, the other for validation.

Here, the code uses:

Aggregation: the standard FedAvg strategy.
Optimizer: standard SGD for both client and server.
Training: 10 rounds of training, with 5 local epochs performed at each round and 128-samples batch size. At least 1 and at most 3 clients, awaited for at most 180 seconds by the server.
Network: communications using websockets.

The server-side script:

import declearn

model = declearn.model.sklearn.SklearnSGDModel.from_parameters(
    kind="classifier", loss="log_loss", penalty="l1"
)
netwk = declearn.communication.NetworkServerConfig(
    protocol="websockets", host="127.0.0.1"", port=8888,
    certificate="path/to/certificate.pem",
    private_key="path/to/private_key.pem"
)
optim = declearn.main.config.FLOptimConfig.from_params(
    aggregator="averaging",
    client_opt=0.001,
)
server = declearn.main.FederatedServer(
    model, netwk, optim, checkpoint="outputs"
)
config = declearn.main.config.FLRunConfig.from_params(
    rounds=10,
    register={"min_clients": 1, "max_clients": 3, "timeout": 180},
    training={"n_epoch": 5, "batch_size": 128, "drop_remainder": False},
)
server.run(config)

The client-side script

import declearn

netwk = declearn.communication.NetworkClientConfig(
    protocol="websockets",
    server_uri="127.0.0.1":8888",
    name="client_name",
    certificate="path/to/root_ca.pem"
)
train = declearn.dataset.InMemoryDataset(
    "path/to/train.csv", target="label",
    expose_classes=True  # enable sharing of unique target values
)
valid = declearn.dataset.InMemoryDataset("path/to/valid.csv", target="label")
client = declearn.main.FederatedClient(
    netwk, train, valid, checkpoint="outputs"
)
client.run()

2. Federated learning on your own dataset

2.1. Quickrun on your problem

Using the mode declearn-quickrun requires a configuration file, some data, and a model file:

A TOML file, to store your experiment configurations. In the MNIST example: examples/mnist_quickrun/config.toml.
A folder with your data, split by client. In the MNIST example: examples/mnist_quickrun/data_iid (after running declearn-split --folder "examples/mnist_quickrun").
A pyhon model file, to declare your model wrapped in a declearn object. In the MNIST example: examples/mnist_quickrun/model.py.

The TOML file

TOML is a minimal, human-readable configuration file format. We use is to store all the configurations of an FL experiment. The TOML is parsed by python as dictionnary with each [header] as a key. For more details, see the TOML doc

This file is your main entry point to everything else. The absolute path to this file should be given as an argument in:

declearn-quickrun --config <path_to_toml_file>

The TOML file has six sections, some of which are optional. Note that the order does not matter, and that we give illustrative, not necessarily functionnal examples.

[network]: Network configuration used by both client and server, most notably the port, host, and ssl certificates. An example:

[network]
    protocol = "websockets" # Protocol used, to keep things simple use websocket
    host = "127.0.0.1" # Address used, works as is on most set ups
    port = 8765 # Port used, works as is on most set ups

This section is parsed as the initialization arguments to the NetworkServer class. Check its documentation to see all available fields. Note it is also used to initialize a NetworkClient, mirroring the server.

[data]: Where to find your data. This is particularly useful if you have split your data yourself, using custom names for files and folders. An example:

[data]
    data_folder = "./custom/data_custom" # Your main data folder
    client_names = ["client_a", "client_b", "client_c"] # The names of your client folders

    [data.dataset_names] # The names of train and test datasets
    train_data = "cifar_train"
    train_target = "label_train"
    valid_data = "cifar_valid"
    valid_target = "label_valid"

This section is parsed as the fields of a DataSourceConfig dataclass. Check its [documentation][declearn.quickrun/DataSourceConfig] to see all available fields. This DataSourceConfig is then parsed by the parse_data_folder function.

[optim]: Optimization options for both client and server, with three distinct sub-sections: the server-side aggregator (i) and optimizer (ii), and the client optimizer (iii). An example:

[optim]
    aggregator = "averaging" # The basic server aggregation strategy

    [optim.server_opt] # Server optimization strategy
    lrate = 1.0 # Server learning rate

    [optim.client_opt] # Client optimization strategy
    lrate = 0.001 # Client learning rate
    modules = [["momentum", {"beta" = 0.9}]] # List of optimizer modules used
    regularizers = [["lasso", {alpha = 0.1}]] # List of regularizer modules

This section is parsed as the fields of a FLOptimConfig dataclass. Check its documentation to see more details on these three sub-sections. For more details on available fields within those subsections, you can naviguate inside the documentation of the Aggregator and Optimizer classes.

[run]: Training process option for both client and server. Most notably, includes the number of rounds as well as the registration, training, and evaluation parameters. An example:

[run]
    rounds = 10 # Number of overall training rounds

    [run.register] # Client registration options
    min_clients = 1 # Minimum of clients that need to connect
    max_clients = 6 # The maximum number of clients that can connect
    timeout = 5 # How long to wait for clients, in seconds

    [run.training] # Client training procedure
    n_epoch = 1 # Number of local epochs
    batch_size = 48 # Training batch size
    drop_remainder = false # Whether to drop the last training examples

    [run.evaluate]
    batch_size = 128 # Evaluation batch size

This section is parsed as the fields of a FLRunConfig dataclass. Check its documentation to see more details on the sub-sections. For more details on available fields within those subsections, you can naviguate inside the documentation of FLRunConfig to the relevant dataclass, for instance TrainingConfig.

[model]: Optional section, where to find the model. An example:

[model]
# The location to a model file
model_file = "./custom/model_custom.py"
# The name of your model file, if different from "MyModel"
model_name = "MyCustomModel"

This section is parsed as the fields of a ModelConfig dataclass. Check its documentation to see all available fields.

[experiment]: Optional section, what to report during the experiment and where to report it. An example:

[experiment]
metrics = [
    # Multi-label Accuracy, Precision, Recall and F1-Score.
    ["multi-classif", {labels = [0,1,2,3,4,5,6,7,8,9]}]
]
checkpoint = "./result_custom" # Custom location for results

This section is parsed as the fields of a ExperimentConfig dataclass. Check its documentation to see all available fields.

The data

Your data, in a standard tabular format, split by client. Within each client folder, we expect four files : training data and labels, validation data and labels.

If your data is not already split by client, we are developping an experimental data splitting utility. It currently has a limited scope, only dealing with classification tasks, excluding multi-label. You can call it using declearn-split --folder <path_to_original_data>. For more details, refer to the documentation.

The Model file

The model file should just contain the model you built for your data, e.g. a torch model, wrapped in a declearn object. See examples/mnist_quickrun/model.py for an example.

The wrapped model should be named "model" by default. If you use any other name, you have to specify it in the TOML file, as demonstrated in ./custom/config_custom.toml.

2.2. Using declearn full capabilities

To upgrade your experimental setting beyond the quickrun mode, you may move on to the hands-on usage section of the documentation.