Skip to content

declearn.optimizer.modules.L2Clipping

Bases: OptiModule

Fixed-threshold, per-parameter L2-norm gradient clipping module.

This module implements the following algorithm:

Init(max_norm):
    assign max_norm
Step(grads):
    norm = euclidean_norm(grads)  # parameter-wise
    clip = max(max_norm / norm, 1.0)
    grads *= clip

In other words, (batch-averaged) gradients are clipped based on their parameter-wise L2 (euclidean) norm, and on a single fixed threshold (as opposed to more complex algorithms that may use parameter-wise and/or adaptive clipping thresholds).

This is equivalent to calling tensorflow.clip_by_norm on each and every data array in the input gradients Vector, with max_norm as norm clipping threshold. If you would rather clip gradients based on their global norm, use the L2GlobalClipping module (only available in declearn >=2.3).

This may notably be used to bound the contribution of batch-based gradients to model updates, notably so as to bound the sensitivity associated to that action. It may also be used to prevent exploding gradients issues.

Source code in declearn/optimizer/modules/_clipping.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
class L2Clipping(OptiModule):
    """Fixed-threshold, per-parameter L2-norm gradient clipping module.

    This module implements the following algorithm:

        Init(max_norm):
            assign max_norm
        Step(grads):
            norm = euclidean_norm(grads)  # parameter-wise
            clip = max(max_norm / norm, 1.0)
            grads *= clip

    In other words, (batch-averaged) gradients are clipped based on
    their parameter-wise L2 (euclidean) norm, and on a single fixed
    threshold (as opposed to more complex algorithms that may use
    parameter-wise and/or adaptive clipping thresholds).

    This is equivalent to calling `tensorflow.clip_by_norm` on each
    and every data array in the input gradients `Vector`, with
    `max_norm` as norm clipping threshold. If you would rather clip
    gradients based on their global norm, use the `L2GlobalClipping`
    module (only available in declearn >=2.3).

    This may notably be used to bound the contribution of batch-based
    gradients to model updates, notably so as to bound the sensitivity
    associated to that action. It may also be used to prevent exploding
    gradients issues.
    """

    name: ClassVar[str] = "l2-clipping"

    def __init__(
        self,
        max_norm: float = 1.0,
    ) -> None:
        """Instantiate the L2-norm gradient-clipping module.

        Parameters
        ----------
        max_norm: float, default=1.0
            Clipping threshold of the L2 (euclidean) norm of
            input (batch-averaged) gradients.
        """
        self.max_norm = max_norm

    def run(
        self,
        gradients: Vector,
    ) -> Vector:
        l2_norm = (gradients**2).sum() ** 0.5
        scaling = (self.max_norm / l2_norm).minimum(1.0)
        return gradients * scaling

    def get_config(
        self,
    ) -> Dict[str, Any]:
        return {"max_norm": self.max_norm}

__init__(max_norm=1.0)

Instantiate the L2-norm gradient-clipping module.

Parameters:

Name Type Description Default
max_norm float

Clipping threshold of the L2 (euclidean) norm of input (batch-averaged) gradients.

1.0
Source code in declearn/optimizer/modules/_clipping.py
62
63
64
65
66
67
68
69
70
71
72
73
74
def __init__(
    self,
    max_norm: float = 1.0,
) -> None:
    """Instantiate the L2-norm gradient-clipping module.

    Parameters
    ----------
    max_norm: float, default=1.0
        Clipping threshold of the L2 (euclidean) norm of
        input (batch-averaged) gradients.
    """
    self.max_norm = max_norm