Skip to content

declearn.optimizer.modules.L2GlobalClipping

Bases: OptiModule

Fixed-threshold, global-L2-norm gradient clipping module.

This module implements the following algorithm:

Init(max_norm):
    assign max_norm
Step(grads):
    norm = euclidean_norm(flatten_and_stack(grads))
    clip = max(max_norm / norm, 1.0)
    grads *= clip

In other words, (batch-averaged) gradients are clipped based on the L2 (euclidean) norm of their concatenated values, so that if that norm is above the selected max_norm threshold, all gradients are scaled by the same factor.

This is equivalent to the tensorflow.clip_by_global_norm and torch.utils.clip_grad_norm_ utils. If you would rather clip gradients on a per-parameter basis, use the L2Clipping module.

This may be used to bound the sensitivity of gradient-based model updates, and/or to prevent exploding gradients issues.

Source code in declearn/optimizer/modules/_clipping.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
class L2GlobalClipping(OptiModule):
    """Fixed-threshold, global-L2-norm gradient clipping module.

    This module implements the following algorithm:

        Init(max_norm):
            assign max_norm
        Step(grads):
            norm = euclidean_norm(flatten_and_stack(grads))
            clip = max(max_norm / norm, 1.0)
            grads *= clip

    In other words, (batch-averaged) gradients are clipped based on
    the L2 (euclidean) norm of their *concatenated* values, so that
    if that norm is above the selected `max_norm` threshold, all
    gradients are scaled by the same factor.

    This is equivalent to the `tensorflow.clip_by_global_norm` and
    `torch.utils.clip_grad_norm_` utils. If you would rather clip
    gradients on a per-parameter basis, use the `L2Clipping` module.

    This may be used to bound the sensitivity of gradient-based model
    updates, and/or to prevent exploding gradients issues.
    """

    name: ClassVar[str] = "l2-global-clipping"

    def __init__(
        self,
        max_norm: float = 1.0,
    ) -> None:
        """Instantiate the L2-norm gradient-clipping module.

        Parameters
        ----------
        max_norm: float, default=1.0
            Clipping threshold of the L2 (euclidean) norm of
            concatenated input gradients.
        """
        self.max_norm = max_norm

    def run(
        self,
        gradients: Vector,
    ) -> Vector:
        # Handle the edge case of an empty input Vector.
        if not gradients.coefs:
            return gradients
        # Compute the total l2 norm of gradients.
        sum_of_squares = (gradients**2).sum()
        total_sum_of_squares = sum(
            type(gradients)({"norm": value})
            for value in sum_of_squares.coefs.values()
        )
        l2_norm = total_sum_of_squares**0.5
        # Compute and apply the associate scaling.
        scaling = (self.max_norm / l2_norm).minimum(1.0).coefs["norm"]
        return gradients * scaling

    def get_config(
        self,
    ) -> Dict[str, Any]:
        return {"max_norm": self.max_norm}

__init__(max_norm=1.0)

Instantiate the L2-norm gradient-clipping module.

Parameters:

Name Type Description Default
max_norm float

Clipping threshold of the L2 (euclidean) norm of concatenated input gradients.

1.0
Source code in declearn/optimizer/modules/_clipping.py
117
118
119
120
121
122
123
124
125
126
127
128
129
def __init__(
    self,
    max_norm: float = 1.0,
) -> None:
    """Instantiate the L2-norm gradient-clipping module.

    Parameters
    ----------
    max_norm: float, default=1.0
        Clipping threshold of the L2 (euclidean) norm of
        concatenated input gradients.
    """
    self.max_norm = max_norm